Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-34.37%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (+53.13%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-73.96%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+2919.79%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (+12.5%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (+12.5%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+2461.46%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-73.96%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-85.42%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+15.63%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+2319.79%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+322.92%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-87.5%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-72.92%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+2522.92%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+108.33%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-66.67%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+125%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+625%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+64.58%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+927.08%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+559.38%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-33.33%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+56.25%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+893.75%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1293.75%)
Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-21.87%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-12.5%)
LabsResearch on distributed system
Stars: ✭ 73 (-23.96%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-5.21%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-13.54%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-28.12%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-25%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-14.58%)
Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (-30.21%)
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (-30.21%)
MleapMLeap: Deploy ML Pipelines to Production
Stars: ✭ 1,232 (+1183.33%)
ThingsboardOpen-source IoT Platform - Device management, data collection, processing and visualization.
Stars: ✭ 10,526 (+10864.58%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-32.29%)
Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-8.33%)
LeharVisualize data using relative ordering
Stars: ✭ 81 (-15.62%)
Spark BigqueryGoogle BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Stars: ✭ 65 (-32.29%)
Spark GbtlrHybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
Stars: ✭ 81 (-15.62%)
Tre[AKBC 19] Improving Relation Extraction by Pre-trained Language Representations
Stars: ✭ 95 (-1.04%)