hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+1290.91%)
FIFA-2019-AnalysisThis is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (-36.36%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+186.36%)
naas⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Stars: ✭ 219 (+397.73%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (+6.82%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-54.55%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+1338.64%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+5320.45%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+8320.45%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+227.27%)
fastverseAn Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
Stars: ✭ 123 (+179.55%)
blockchain-etl-streamingStreaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (+29.55%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+79.55%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (+47.73%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+5754.55%)
zinggScalable identity resolution, entity resolution, data mastering and deduplication using ML
Stars: ✭ 655 (+1388.64%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-68.18%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+79.55%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+677.27%)
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+534.09%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+11079.55%)
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+75%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (+20.45%)
sql-to-redis🔄 Simple tool for ETL. From SQL to Redis.
Stars: ✭ 18 (-59.09%)
cubetlCubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-52.27%)
PDAP-ScrapersCode relating to scraping public police data.
Stars: ✭ 72 (+63.64%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-45.45%)
DQCS数据质量控制系统
Stars: ✭ 34 (-22.73%)
neon-workshopA Pachyderm deep learning tutorial for conference workshops
Stars: ✭ 19 (-56.82%)
50-days-of-Statistics-for-Data-ScienceThis repository consist of a 50-day program. All the statistics required for the complete understanding of data science will be uploaded in this repository.
Stars: ✭ 19 (-56.82%)
autoencoders tensorflowAutomatic feature engineering using deep learning and Bayesian inference using TensorFlow.
Stars: ✭ 66 (+50%)
hrv-analysisPackage for Heart Rate Variability analysis in Python
Stars: ✭ 225 (+411.36%)
wrangleA data transformation package for deep learning with Autonomio, Keras and TensorFlow.
Stars: ✭ 15 (-65.91%)
pyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (+2104.55%)
tutorialsShort programming tutorials pertaining to data analysis.
Stars: ✭ 14 (-68.18%)
covid-19Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-68.18%)
viewflowViewflow is an Airflow-based framework that allows data scientists to create data models without writing Airflow code.
Stars: ✭ 110 (+150%)
mikThe Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
Stars: ✭ 32 (-27.27%)
OpenKettleWebUI一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+213.64%)
dominance-analysisThis package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models.
Stars: ✭ 111 (+152.27%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (-13.64%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+52.27%)
python mozetlETL jobs for Firefox Telemetry
Stars: ✭ 25 (-43.18%)
dflibIn-memory Java DataFrame library
Stars: ✭ 50 (+13.64%)
DataBridge.NETConfigurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-63.64%)
CVparserCVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (-36.36%)
dbt-sugardbt-sugar is a CLI tool that allows users of dbt to have fun and ease performing actions around dbt models
Stars: ✭ 139 (+215.91%)