Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-80.28%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+471.83%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+77.46%)
PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+1460.56%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+111.27%)
Cluster PackA library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-67.61%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+3983.1%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-64.79%)
Pyspark StubsApache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+38.03%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-9.86%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+122.54%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1288.73%)
Quinnpyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+205.63%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-66.2%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+791.55%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+1802.82%)
Spark GotchasSpark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+333.8%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (+52.11%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+3446.48%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+3363.38%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1784.51%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+3171.83%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-11.27%)
Morphl Community EditionMorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Stars: ✭ 253 (+256.34%)
Awesome SparkA curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+1394.37%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+1243.66%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-83.1%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (+107.04%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+204.23%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+880.28%)
Repo 2019BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Stars: ✭ 133 (+87.32%)
Spark SyntaxThis is a repo documenting the best practices in PySpark.
Stars: ✭ 412 (+480.28%)
pyspark-cassandrapyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Stars: ✭ 70 (-1.41%)
Tdigestt-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (+285.92%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+181.69%)
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-29.58%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (+52.11%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-64.79%)
workshop-sparkCódigo para workshops Spark com ambiente de desenvolvimento em docker
Stars: ✭ 27 (-61.97%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+56.34%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-54.93%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-60.56%)
spark3DSpark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-67.61%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+28.17%)