All Projects → soda-spark → Similar Projects or Alternatives

203 Open source projects that are alternatives of or similar to soda-spark

re-data
re_data - fix data issues before your users & CEO would discover them 😊
Stars: ✭ 955 (+1546.55%)
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-56.9%)
Mutual labels:  pyspark, data-engineering
contessa
Easy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-70.69%)
Mutual labels:  data-engineering, data-quality
Great expectations
Always know what to expect from your data.
Stars: ✭ 5,808 (+9913.79%)
Mutual labels:  data-engineering, data-quality
Applied Ml
📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+30631.03%)
Mutual labels:  data-engineering, data-quality
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (+117.24%)
Mutual labels:  pyspark, data-engineering
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+148.28%)
Mutual labels:  data-engineering, data-quality
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+991.38%)
Mutual labels:  pyspark, data-engineering
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (-50%)
Mutual labels:  pyspark, data-quality
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-18.97%)
Mutual labels:  pyspark, data-engineering
Hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (+86.21%)
Mutual labels:  pyspark
Eat pyspark in 10 days
pyspark🍒🥭 is delicious,just eat it!😋😋
Stars: ✭ 116 (+100%)
Mutual labels:  pyspark
Morphl Community Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Stars: ✭ 253 (+336.21%)
Mutual labels:  pyspark
spark3D
Spark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-60.34%)
Mutual labels:  pyspark
Relation extraction
Relation Extraction using Deep learning(CNN)
Stars: ✭ 96 (+65.52%)
Mutual labels:  pyspark
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+272.41%)
Mutual labels:  pyspark
Pyspark Tutorial
PySpark Code for Hands-on Learners
Stars: ✭ 91 (+56.9%)
Mutual labels:  pyspark
Spark python ml examples
Spark 2.0 Python Machine Learning examples
Stars: ✭ 87 (+50%)
Mutual labels:  pyspark
Spark Practice
Apache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+244.83%)
Mutual labels:  pyspark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (+8.62%)
Mutual labels:  pyspark
Awesome Spark
A curated list of awesome Apache Spark packages and resources.
Stars: ✭ 1,061 (+1729.31%)
Mutual labels:  pyspark
NBi
NBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (+75.86%)
Mutual labels:  data-quality
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+2229.31%)
Mutual labels:  pyspark
Spark Iforest
Isolation Forest on Spark
Stars: ✭ 166 (+186.21%)
Mutual labels:  pyspark
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+1544.83%)
Mutual labels:  pyspark
Sparkling Titanic
Training models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-79.31%)
Mutual labels:  pyspark
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+3905.17%)
Mutual labels:  pyspark
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (-60.34%)
Mutual labels:  pyspark
airflow-dbt-python
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (+91.38%)
Mutual labels:  data-engineering
Pyspark Cheatsheet
🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (+86.21%)
Mutual labels:  pyspark
pyspark-cassandra
pyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Stars: ✭ 70 (+20.69%)
Mutual labels:  pyspark
Pyspark Stubs
Apache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (+68.97%)
Mutual labels:  pyspark
Quinn
pyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+274.14%)
Mutual labels:  pyspark
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+2206.9%)
Mutual labels:  pyspark
jgit-spark-connector
jgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Stars: ✭ 71 (+22.41%)
Mutual labels:  pyspark
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+56.9%)
Mutual labels:  pyspark
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+4898.28%)
Mutual labels:  pyspark
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (+10.34%)
Mutual labels:  pyspark
ohsome-quality-analyst
Data quality estimations for OpenStreetMap
Stars: ✭ 28 (-51.72%)
Mutual labels:  data-quality
Petastorm
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+1810.34%)
Mutual labels:  pyspark
Spark Nlp
State of the Art Natural Language Processing
Stars: ✭ 2,518 (+4241.38%)
Mutual labels:  pyspark
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1600%)
Mutual labels:  pyspark
qsv
CSVs sliced, diced & analyzed.
Stars: ✭ 438 (+655.17%)
Mutual labels:  data-engineering
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-75.86%)
Mutual labels:  pyspark
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (+184.48%)
Mutual labels:  pyspark
Pyspark Setup Demo
Demo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-58.62%)
Mutual labels:  pyspark
hive compared bq
hive_compared_bq compares/validates 2 (SQL like) tables, and graphically shows the rows/columns that are different.
Stars: ✭ 27 (-53.45%)
Mutual labels:  data-quality
Cluster Pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-60.34%)
Mutual labels:  pyspark
Handyspark
HandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+172.41%)
Mutual labels:  pyspark
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+1100%)
Mutual labels:  pyspark
AirflowETL
Blog post on ETL pipelines with Airflow
Stars: ✭ 20 (-65.52%)
Mutual labels:  data-engineering
workshop-spark
Código para workshops Spark com ambiente de desenvolvimento em docker
Stars: ✭ 27 (-53.45%)
Mutual labels:  pyspark
Learningapachespark
LearningApacheSpark
Stars: ✭ 155 (+167.24%)
Mutual labels:  pyspark
Spark Syntax
This is a repo documenting the best practices in PySpark.
Stars: ✭ 412 (+610.34%)
Mutual labels:  pyspark
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+600%)
Mutual labels:  pyspark
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+158.62%)
Mutual labels:  pyspark
Pyspark Boilerplate
A boilerplate for writing PySpark Jobs
Stars: ✭ 318 (+448.28%)
Mutual labels:  pyspark
Spark Gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
Stars: ✭ 308 (+431.03%)
Mutual labels:  pyspark
awesome-dbt
A curated list of awesome dbt resources
Stars: ✭ 520 (+796.55%)
Mutual labels:  data-engineering
Cc Pyspark
Process Common Crawl data with Python and Spark
Stars: ✭ 147 (+153.45%)
Mutual labels:  pyspark
1-60 of 203 similar projects