AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+2743.35%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-85.55%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-69.36%)
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (-36.42%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+10202.89%)
AuptimizerAn automatic ML model optimization tool.
Stars: ✭ 166 (-4.05%)
Gspread PandasA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (+30.64%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+265.9%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-54.34%)
Beyond Jupyter🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
Stars: ✭ 135 (-21.97%)
airflow-dbt-pythonA collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
Stars: ✭ 111 (-35.84%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-12.14%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+399.42%)
D6t PythonAccelerate data science
Stars: ✭ 118 (-31.79%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-26.59%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-88.44%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+358.38%)
Udacity Data Engineering ProjectsFew projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
Stars: ✭ 458 (+164.74%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-54.34%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+4498.84%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1278.61%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-27.17%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+773.41%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-20.81%)
PloomberA convention over configuration workflow orchestrator. Develop locally (Jupyter or your favorite editor), deploy to Airflow or Kubernetes.
Stars: ✭ 221 (+27.75%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-29.48%)
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+109.25%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+138.73%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+24543.93%)
Data Science Stack Cookiecutter🐳📊🤓Cookiecutter template to launch an awesome dockerized Data Science toolstack (incl. Jupyster, Superset, Postgres, Minio, AirFlow & API Star)
Stars: ✭ 153 (-11.56%)
PresentationsSlide show presentations regarding data driven investing.
Stars: ✭ 162 (-6.36%)
LazynlpLibrary to scrape and clean web pages to create massive datasets.
Stars: ✭ 1,985 (+1047.4%)
Datascience Pizza🍕 Repositório para juntar informações sobre materiais de estudo em análise de dados e áreas afins, empresas que trabalham com dados e dicionário de conceitos
Stars: ✭ 2,043 (+1080.92%)
Datasets For GoodList of datasets to apply stats/machine learning/technology to the world of social good.
Stars: ✭ 174 (+0.58%)
Data Science ToolkitCollection of stats, modeling, and data science tools in Python and R.
Stars: ✭ 169 (-2.31%)
Airflow ExporterAirflow plugin to export dag and task based metrics to Prometheus.
Stars: ✭ 161 (-6.94%)
DanmfA sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).
Stars: ✭ 161 (-6.94%)
MatplotplusplusMatplot++: A C++ Graphics Library for Data Visualization 📊🗾
Stars: ✭ 2,433 (+1306.36%)
Influxdb exporterA server that accepts InfluxDB metrics via the HTTP API and exports them via HTTP for Prometheus consumption
Stars: ✭ 159 (-8.09%)
Scikit PlotAn intuitive library to add plotting functionality to scikit-learn objects.
Stars: ✭ 2,162 (+1149.71%)
DstackAn open-source tool to rapidly develop data applications with Python
Stars: ✭ 174 (+0.58%)
PzadКурс "Прикладные задачи анализа данных" (ВМК, МГУ имени М.В. Ломоносова)
Stars: ✭ 160 (-7.51%)
PrimehubA toil-free multi-tenancy machine learning platform in your Kubernetes cluster
Stars: ✭ 160 (-7.51%)
GhactionsGitHub actions for R and accompanying R package
Stars: ✭ 159 (-8.09%)
AulasAulas da Escola de Inteligência Artificial de São Paulo
Stars: ✭ 166 (-4.05%)
FastbookThe fastai book, published as Jupyter Notebooks
Stars: ✭ 13,998 (+7991.33%)
GensimTopic Modelling for Humans
Stars: ✭ 12,763 (+7277.46%)