All Projects → Setl → Similar Projects or Alternatives

3828 Open source projects that are alternatives of or similar to Setl

Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+6126.58%)
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+92.41%)
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1593.67%)
Mutual labels:  data-science, spark, data-analysis, big-data
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+701.27%)
Mutual labels:  data-science, spark, etl, data-engineering
Awesome Business Intelligence
Actively curated list of awesome BI tools. PRs welcome!
Stars: ✭ 1,157 (+1364.56%)
Mutual labels:  data-science, data-analysis, etl
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+981.01%)
Mutual labels:  data-science, data-analysis, big-data
Aws Serverless Data Lake Framework
Enterprise-grade, production-hardened, serverless data lake on AWS
Stars: ✭ 179 (+126.58%)
Mutual labels:  etl, data-engineering, framework
Just Dashboard
📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+1812.66%)
Chain.jl
A Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
Stars: ✭ 118 (+49.37%)
Mutual labels:  data-science, data-analysis, pipeline
Pipelinex
PipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (+60.76%)
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1148.1%)
Mutual labels:  data-science, spark, data-analysis
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+993.67%)
Knowage Server
Knowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Stars: ✭ 276 (+249.37%)
Mutual labels:  dataset, data-analysis, big-data
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+27808.86%)
Mutual labels:  data-science, spark, big-data
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-17.72%)
Mutual labels:  data-science, spark, big-data
Superset
Apache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+53867.09%)
Spark R Notebooks
R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (+37.97%)
Mutual labels:  data-science, data-analysis, big-data
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (+59.49%)
Mutual labels:  data-science, etl, data-engineering
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+73.42%)
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+3753.16%)
Mutual labels:  data-science, spark, big-data
Data Science Live Book
An open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (+144.3%)
Mutual labels:  data-science, data-analysis, big-data
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-68.35%)
Mutual labels:  spark, pipeline, etl
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (+258.23%)
Mutual labels:  data-science, dataset, big-data
Datavec
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+244.3%)
Mutual labels:  spark, pipeline, etl
Datascience course
Curso de Data Science em Português
Stars: ✭ 294 (+272.15%)
Mutual labels:  data-science, dataset, data-analysis
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+356.96%)
Mutual labels:  spark, big-data, etl
Pachyderm
Reproducible Data Science at Scale!
Stars: ✭ 5,305 (+6615.19%)
Mutual labels:  data-science, data-analysis, big-data
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-83.54%)
Mutual labels:  big-data, spark, data-analysis
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+245.57%)
Mutual labels:  data-science, dataset, data-analysis
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+7059.49%)
Mutual labels:  data-science, spark, big-data
My Journey In The Data Science World
📢 Ready to learn or review your knowledge!
Stars: ✭ 1,175 (+1387.34%)
Mutual labels:  data-science, data-analysis, big-data
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (+474.68%)
Mutual labels:  data-science, data-analysis, big-data
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (+93.67%)
Mutual labels:  data-science, data-analysis, big-data
Great expectations
Always know what to expect from your data.
Stars: ✭ 5,808 (+7251.9%)
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (+116.46%)
Mutual labels:  data-science, dataset, data-analysis
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (+35.44%)
Mutual labels:  data-science, data-analysis, big-data
Pythondata
repo for code published on pythondata.com
Stars: ✭ 113 (+43.04%)
Mutual labels:  data-science, data-analysis, big-data
Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+3160.76%)
Mutual labels:  spark, big-data, data-engineering
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+2918.99%)
Mutual labels:  data-science, etl, data-engineering
Spark Alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (+54.43%)
Mutual labels:  data-science, spark, data-engineering
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (+197.47%)
Mutual labels:  data-analysis, big-data, etl
Bodywork Core
Deploy machine learning projects developed in Python, to Kubernetes. Accelerated MLOps 🚀
Stars: ✭ 145 (+83.54%)
Mutual labels:  data-science, pipeline, framework
Datacleaner
The premier open source Data Quality solution
Stars: ✭ 391 (+394.94%)
Mutual labels:  data-science, data-analysis, etl
Football Data
football (soccer) datasets
Stars: ✭ 18 (-77.22%)
Mutual labels:  data-science, dataset, data-analysis
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+0%)
Mutual labels:  data-science, etl, data-engineering
Sparkjni
A heterogeneous Apache Spark framework.
Stars: ✭ 11 (-86.08%)
Mutual labels:  spark, big-data
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+10443.04%)
Mutual labels:  data-science, data-analysis
Steppy Toolkit
Curated set of transformers that make your work with steppy faster and more effective 🔭
Stars: ✭ 21 (-73.42%)
Mutual labels:  data-science, pipeline
Tedsds
Apache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-82.28%)
Mutual labels:  dataset, spark
Spark
Apache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+39922.78%)
Mutual labels:  spark, big-data
Mlbox
MLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+1417.72%)
Mutual labels:  data-science, pipeline
Mlcourse.ai
Open Machine Learning Course
Stars: ✭ 7,963 (+9979.75%)
Mutual labels:  data-science, data-analysis
Dataconfs
A list of conferences connected with data worldwide.
Stars: ✭ 36 (-54.43%)
Mutual labels:  data-science, dataset
Mlj.jl
A Julia machine learning framework
Stars: ✭ 982 (+1143.04%)
Mutual labels:  data-science, pipeline
Autodl
Automated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+981.01%)
Mutual labels:  data-science, big-data
Art Data Science
The Art of Data Science
Stars: ✭ 32 (-59.49%)
Mutual labels:  data-science, data-analysis
Janitor
simple tools for data cleaning in R
Stars: ✭ 981 (+1141.77%)
Mutual labels:  data-science, data-analysis
Spark Website
Apache Spark Website
Stars: ✭ 75 (-5.06%)
Mutual labels:  spark, big-data
Hyperlearn
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+1424.05%)
Mutual labels:  data-science, data-analysis
Qri
you're invited to a data party!
Stars: ✭ 1,003 (+1169.62%)
Mutual labels:  data-science, dataset
1-60 of 3828 similar projects