All Projects → Pyspark Example Project → Similar Projects or Alternatives

1517 Open source projects that are alternatives of or similar to Pyspark Example Project

Goodreads etl pipeline
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+25.28%)
Mutual labels:  spark, data-engineering
Spark python ml examples
Spark 2.0 Python Machine Learning examples
Stars: ✭ 87 (-86.26%)
Mutual labels:  spark, pyspark
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+88.78%)
Mutual labels:  spark, etl
Hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-82.94%)
Mutual labels:  spark, pyspark
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+9.95%)
Mutual labels:  spark, pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-76.3%)
Mutual labels:  spark, pyspark
Cc Pyspark
Process Common Crawl data with Python and Spark
Stars: ✭ 147 (-76.78%)
Mutual labels:  spark, pyspark
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+266.98%)
Mutual labels:  spark, pyspark
Pyspark Learning
Updated repository
Stars: ✭ 147 (-76.78%)
Mutual labels:  spark, pyspark
Spark Practice
Apache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (-68.4%)
Mutual labels:  spark, pyspark
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+297.79%)
Mutual labels:  spark, pyspark
Every Single Day I Tldr
A daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-60.66%)
Mutual labels:  spark, data-engineering
Aws Serverless Data Lake Framework
Enterprise-grade, production-hardened, serverless data lake on AWS
Stars: ✭ 179 (-71.72%)
Mutual labels:  etl, data-engineering
Pixiedust
Python Helper library for Jupyter Notebooks
Stars: ✭ 998 (+57.66%)
Mutual labels:  data-science, spark
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-94.63%)
Mutual labels:  spark, pyspark
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+36.49%)
Mutual labels:  data-science, data-engineering
AirflowDataPipeline
Example of an ETL Pipeline using Airflow
Stars: ✭ 24 (-96.21%)
Mutual labels:  etl, data-engineering
etl manager
A python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-97.79%)
Mutual labels:  etl, data-engineering
Etl with python
ETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (-89.26%)
Mutual labels:  data-science, etl
Benthos
Fancy stream processing made operationally mundane
Stars: ✭ 3,705 (+485.31%)
Mutual labels:  etl, data-engineering
Sk Dist
Distributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-58.93%)
Mutual labels:  data-science, spark
Spark Notebook
Interactive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+386.73%)
Mutual labels:  data-science, spark
Dataform
Dataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (-45.97%)
Mutual labels:  etl, data-engineering
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+138.7%)
Mutual labels:  data-science, data-engineering
Python Bigdata
Data science and Big Data with Python
Stars: ✭ 112 (-82.31%)
Mutual labels:  data-science, spark
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-78.36%)
Mutual labels:  data-science, data-engineering
Pipelinex
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-79.94%)
Mutual labels:  data-science, data-engineering
Wedatasphere
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-41.23%)
Mutual labels:  spark, etl
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-42.97%)
Mutual labels:  spark, etl
Datacleaner
The premier open source Data Quality solution
Stars: ✭ 391 (-38.23%)
Mutual labels:  data-science, etl
Elastic
R client for the Elasticsearch HTTP API
Stars: ✭ 227 (-64.14%)
Mutual labels:  data-science, etl
Gspread Pandas
A package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (-64.3%)
Mutual labels:  data-science, data-engineering
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+380.88%)
Mutual labels:  data-science, spark
Cql
Categorical Query Language IDE
Stars: ✭ 196 (-69.04%)
Mutual labels:  data-science, etl
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-90.84%)
Mutual labels:  pyspark, data-engineering
etl
[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (-55.92%)
Mutual labels:  etl, data-engineering
polygon-etl
ETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-91.63%)
Mutual labels:  etl, data-engineering
Soda Sql
Metric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (-72.67%)
Mutual labels:  data-science, data-engineering
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (-77.25%)
Mutual labels:  etl, data-engineering
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (-3.32%)
Mutual labels:  etl, data-engineering
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-97.31%)
Mutual labels:  etl, pyspark
python mozetl
ETL jobs for Firefox Telemetry
Stars: ✭ 25 (-96.05%)
Mutual labels:  etl, pyspark
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (-92.1%)
Mutual labels:  spark, pyspark
ODSC India 2018
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-95.89%)
Mutual labels:  spark, pyspark
kafka-compose
🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-94.94%)
Mutual labels:  spark, pyspark
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-96.05%)
Mutual labels:  spark, pyspark
beneath
Beneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-89.73%)
Mutual labels:  etl, data-engineering
Agile data code 2
Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-34.76%)
Mutual labels:  data-science, spark
arthur-redshift-etl
ELT Code for your Data Warehouse
Stars: ✭ 22 (-96.52%)
Mutual labels:  etl, data-engineering
Datavec
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-57.03%)
Mutual labels:  spark, etl
Around Dataengineering
A Data Engineering & Machine Learning Knowledge Hub
Stars: ✭ 257 (-59.4%)
Mutual labels:  spark, data-engineering
Dagster
An orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+547.55%)
Mutual labels:  data-science, etl
Auptimizer
An automatic ML model optimization tool.
Stars: ✭ 166 (-73.78%)
Mutual labels:  data-science, data-engineering
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+288.47%)
Mutual labels:  spark, pyspark
Learn Something Every Day
📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (-42.81%)
Mutual labels:  data-science, data-engineering
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-35.86%)
Mutual labels:  spark, pyspark
Great expectations
Always know what to expect from your data.
Stars: ✭ 5,808 (+817.54%)
Mutual labels:  data-science, data-engineering
Heamy
A set of useful tools for competitive data science.
Stars: ✭ 511 (-19.27%)
Mutual labels:  data-science
Pygam
[HELP REQUESTED] Generalized Additive Models in Python
Stars: ✭ 569 (-10.11%)
Mutual labels:  data-science
Cdap
An open source framework for building data analytic applications.
Stars: ✭ 509 (-19.59%)
Mutual labels:  spark
61-120 of 1517 similar projects