Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → datitran → Spark Tdd Example

datitran / Spark Tdd Example

Licence: mit

A simple Spark TDD example

Programming Languages

139335 projects - #7 most used programming language

Labels

jupyter-notebook spark tdd pyspark

Projects that are alternatives of or similar to Spark Tdd Example

Word2Vec models with Twitter data using Spark. Blog:

Stars: ✭ 64 (+178.26%)

Mutual labels: jupyter-notebook, spark, pyspark

Pysparkgeoanalysis

🌐 Interactive Workshop on GeoAnalysis using PySpark

Stars: ✭ 63 (+173.91%)

Mutual labels: jupyter-notebook, spark, pyspark

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+4047.83%)

Mutual labels: jupyter-notebook, spark, pyspark

Azure Cosmosdb Spark

Apache Spark Connector for Azure Cosmos DB

Stars: ✭ 165 (+617.39%)

Mutual labels: jupyter-notebook, spark, pyspark

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (+552.17%)

Mutual labels: jupyter-notebook, spark, pyspark

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+5717.39%)

Mutual labels: jupyter-notebook, spark, pyspark

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+4186.96%)

Mutual labels: jupyter-notebook, spark, pyspark

Pyspark Learning

Updated repository

Stars: ✭ 147 (+539.13%)

Mutual labels: jupyter-notebook, spark, pyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes

Stars: ✭ 158 (+586.96%)

Mutual labels: jupyter-notebook, spark, pyspark

Apache Spark (PySpark) Practice on Real Data

Stars: ✭ 200 (+769.57%)

Mutual labels: jupyter-notebook, spark, pyspark

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (+1026.09%)

Mutual labels: jupyter-notebook, spark

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (+8.7%)

Mutual labels: spark, pyspark

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (+47.83%)

Mutual labels: spark, pyspark

spark-extension

A library that provides useful extensions to Apache Spark and PySpark.

Stars: ✭ 25 (+8.7%)

Mutual labels: spark, pyspark

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (+1217.39%)

Mutual labels: jupyter-notebook, spark

The Hunting ELK

Stars: ✭ 3,097 (+13365.22%)

Mutual labels: jupyter-notebook, spark

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+1665.22%)

Mutual labels: spark, pyspark

Enterprise gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.

Stars: ✭ 412 (+1691.3%)

Mutual labels: jupyter-notebook, spark

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (+1695.65%)

Mutual labels: jupyter-notebook, spark

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+382.61%)

Mutual labels: spark, pyspark

View All Similar Projects ➔

A simple PySpark example using TDD

This is a very basic example of how to use Test Driven Development (TDD) in the context of PySpark, Spark's Python API.

Getting Started

Use brew to install Apache Spark: brew install apache-spark
Change logging settings:

cd /usr/local/Cellar/apache-spark/2.1.0/libexec/conf
cp log4j.properties.template log4j.properties
Set info to error: log4j.rootCategory=ERROR, console

Add this to your bash profile: export SPARK_HOME="/usr/local/Cellar/apache-spark/2.1.0/libexec/"
Use nosetests to run the test: nosetests -vs test_clustering.py

Dependencies

Apache Spark Spark 2.1.0
Python Python 3.5
nosetests nose 1.3.7

Copyright

See LICENSE for details. Copyright (c) 2017 Dat Tran.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 23

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗