All Projects β†’ datitran β†’ Spark Tdd Example

datitran / Spark Tdd Example

Licence: mit
A simple Spark TDD example

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Spark Tdd Example

W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (+178.26%)
Mutual labels:  jupyter-notebook, spark, pyspark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (+173.91%)
Mutual labels:  jupyter-notebook, spark, pyspark
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+4047.83%)
Mutual labels:  jupyter-notebook, spark, pyspark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (+617.39%)
Mutual labels:  jupyter-notebook, spark, pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+552.17%)
Mutual labels:  jupyter-notebook, spark, pyspark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+5717.39%)
Mutual labels:  jupyter-notebook, spark, pyspark
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+4186.96%)
Mutual labels:  jupyter-notebook, spark, pyspark
Pyspark Learning
Updated repository
Stars: ✭ 147 (+539.13%)
Mutual labels:  jupyter-notebook, spark, pyspark
Handyspark
HandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+586.96%)
Mutual labels:  jupyter-notebook, spark, pyspark
Spark Practice
Apache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+769.57%)
Mutual labels:  jupyter-notebook, spark, pyspark
Spark Jupyter Aws
A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support
Stars: ✭ 259 (+1026.09%)
Mutual labels:  jupyter-notebook, spark
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (+8.7%)
Mutual labels:  spark, pyspark
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (+47.83%)
Mutual labels:  spark, pyspark
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (+8.7%)
Mutual labels:  spark, pyspark
Zat
Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (+1217.39%)
Mutual labels:  jupyter-notebook, spark
Helk
The Hunting ELK
Stars: ✭ 3,097 (+13365.22%)
Mutual labels:  jupyter-notebook, spark
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+1665.22%)
Mutual labels:  spark, pyspark
Enterprise gateway
A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Stars: ✭ 412 (+1691.3%)
Mutual labels:  jupyter-notebook, spark
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+1695.65%)
Mutual labels:  jupyter-notebook, spark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+382.61%)
Mutual labels:  spark, pyspark

A simple PySpark example using TDD

This is a very basic example of how to use Test Driven Development (TDD) in the context of PySpark, Spark's Python API.

Getting Started

  1. Use brew to install Apache Spark: brew install apache-spark
  2. Change logging settings:
  • cd /usr/local/Cellar/apache-spark/2.1.0/libexec/conf
  • cp log4j.properties.template log4j.properties
  • Set info to error: log4j.rootCategory=ERROR, console
  1. Add this to your bash profile: export SPARK_HOME="/usr/local/Cellar/apache-spark/2.1.0/libexec/"
  2. Use nosetests to run the test: nosetests -vs test_clustering.py

Dependencies

Copyright

See LICENSE for details. Copyright (c) 2017 Dat Tran.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].