All Projects → runawayhorse001 → Learningapachespark

runawayhorse001 / Learningapachespark

Licence: mit
LearningApacheSpark

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Learningapachespark

Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+515.48%)
Mutual labels:  spark, pyspark
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-58.71%)
Mutual labels:  spark, pyspark
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+536.13%)
Mutual labels:  spark, pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-3.23%)
Mutual labels:  spark, pyspark
Hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-30.32%)
Mutual labels:  spark, pyspark
Sparkling Titanic
Training models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-92.26%)
Mutual labels:  spark, pyspark
Cc Pyspark
Process Common Crawl data with Python and Spark
Stars: ✭ 147 (-5.16%)
Mutual labels:  spark, pyspark
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+308.39%)
Mutual labels:  spark, pyspark
Relation extraction
Relation Extraction using Deep learning(CNN)
Stars: ✭ 96 (-38.06%)
Mutual labels:  spark, pyspark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+763.23%)
Mutual labels:  spark, pyspark
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (-85.16%)
Mutual labels:  spark, pyspark
Eat pyspark in 10 days
pyspark🍒🥭 is delicious,just eat it!😋😋
Stars: ✭ 116 (-25.16%)
Mutual labels:  spark, pyspark
Spark Scala Tutorial
A free tutorial for Apache Spark.
Stars: ✭ 907 (+485.16%)
Mutual labels:  spark, tutorial
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-90.97%)
Mutual labels:  spark, pyspark
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+349.03%)
Mutual labels:  spark, pyspark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-59.35%)
Mutual labels:  spark, pyspark
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+161.94%)
Mutual labels:  spark, pyspark
Justenoughscalaforspark
A tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Stars: ✭ 538 (+247.1%)
Mutual labels:  spark, tutorial
Spark python ml examples
Spark 2.0 Python Machine Learning examples
Stars: ✭ 87 (-43.87%)
Mutual labels:  spark, pyspark
Pyspark Cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-30.32%)
Mutual labels:  spark, pyspark

Learning Apache Spark

Website: https://runawayhorse001.github.io/LearningApacheSpark/

This is a shared repository for Learning Apache Spark Notes. The first version was posted on Github in [Feng2017]. This shared repository mainly contains the self-learning and self-teaching notes from Wenqiang during his IMA Data Science Fellowship.

In this repository, I try to use the detailed demo code and examples to show how to use each main functions. If you find your work wasn’t cited in this note, please feel free to let me know.

Although I am by no means an data mining programming and Big Data expert, I decided that it would be useful for me to share what I learned about PySpark programming in the form of easy tutorials with detailed example. I hope those tutorials will be a valuable tool for your studies.

The tutorials assume that the reader has a preliminary knowledge of programing and Linux. And this document is generated automatically by using sphinx.

BTW, I successfully brought git output format into Sphnix in this repository. You need to install sphinx-to-github and more details can be found from the following reference:

Reference:

Now, the sphinx-to-github function for github pages can be easily solved by add an empty file .nojekyll to your docs folder. I add the following piece of code in my docgen.py to add it automatically:

    # add .nojekyll file to fix the github pages issues
    nojekyll_path = os.path.join(outdir, '.nojekyll')
    if not os.path.exists(nojekyll_path):
        nojekyll = open(nojekyll_path,'a')
        nojekyll.close()
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].