SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-98.39%)
Mara PipelinesA lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (-62.57%)
DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (-92.05%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (-87.5%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+766.72%)
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (-98.43%)
PycmMulti-class confusion matrix library in Python
Stars: ✭ 1,076 (-78.13%)
naas⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Stars: ✭ 219 (-95.55%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (-97.07%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (-82.44%)
Data Science Resources👨🏽🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-96.52%)
AkshareAKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Stars: ✭ 4,334 (-11.89%)
SkdataPython tools for data analysis
Stars: ✭ 16 (-99.67%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (-51.51%)
Chain.jlA Julia package for piping a value through a series of transformation expressions using a more convenient syntax than Julia's native piping functionality.
Stars: ✭ 118 (-97.6%)
OpenrefineOpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+73.43%)
Gspread PandasA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (-95.41%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-97.44%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (-87.13%)
Knowledge RepoA next-generation curated knowledge sharing platform for data scientists and other technical professions.
Stars: ✭ 4,956 (+0.75%)
GraphiaA visualisation tool for the creation and analysis of graphs
Stars: ✭ 67 (-98.64%)
DatacomparerdataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
Stars: ✭ 58 (-98.82%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-98.39%)
SteppyLightweight, Python library for fast and reproducible experimentation 🔬
Stars: ✭ 119 (-97.58%)
Data Science HacksData Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-94.45%)
Mlj.jlA Julia machine learning framework
Stars: ✭ 982 (-80.04%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (-69.28%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-97.42%)
PdpipeEasy pipelines for pandas DataFrames.
Stars: ✭ 590 (-88.01%)
Steppy ToolkitCurated set of transformers that make your work with steppy faster and more effective 🔭
Stars: ✭ 21 (-99.57%)
Gopup数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Stars: ✭ 1,229 (-75.02%)
FlyteAccelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Stars: ✭ 1,242 (-74.75%)
Awesome BigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+113.01%)
Ai Expert RoadmapRoadmap to becoming an Artificial Intelligence Expert in 2021
Stars: ✭ 15,441 (+213.91%)
Pandas DatareaderExtract data from a wide range of Internet sources into a pandas DataFrame.
Stars: ✭ 2,183 (-55.62%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-97.82%)
Scikit Learnscikit-learn: machine learning in Python
Stars: ✭ 48,322 (+882.35%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-97.8%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-97.78%)
XdaR package for exploratory data analysis
Stars: ✭ 112 (-97.72%)
SweetvizVisualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (-62.37%)
CubesLight-weight Python OLAP framework for multi-dimensional data analysis
Stars: ✭ 1,393 (-71.68%)
AlgocodeWelcome everyone!🌟 Here you can solve problems, build scrappers and much more💻
Stars: ✭ 113 (-97.7%)
Dat8General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (-69.18%)
Seaborn TutorialThis repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-97.68%)
D6t PythonAccelerate data science
Stars: ✭ 118 (-97.6%)
AuptimizerAn automatic ML model optimization tool.
Stars: ✭ 166 (-96.63%)
KibaData processing & ETL framework for Ruby
Stars: ✭ 1,618 (-67.11%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-97.52%)
Datasist A Python library for easy data analysis, visualization, exploration and modeling
Stars: ✭ 123 (-97.5%)
CodesearchnetDatasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (-71.99%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-97.7%)