beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-54.86%)
contessaEasy way to define, execute and store quality rules for your data.
Stars: ✭ 17 (-88.19%)
dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (-79.17%)
astroAstro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (-45.14%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+3315.97%)
google-sheets-etlLive import all your Google Sheets to your data warehouse
Stars: ✭ 15 (-89.58%)
CookbookThe Data Engineering Cookbook
Stars: ✭ 9,829 (+6725.69%)
Applied Ml📚 Papers & tech blogs by companies sharing their work on data science & machine learning in production.
Stars: ✭ 17,824 (+12277.78%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+2746.53%)
ml-in-productionThe practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Stars: ✭ 29 (-79.86%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+1526.39%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (-69.44%)
AddaxAddax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+327.08%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-90.28%)
neon-workshopA Pachyderm deep learning tutorial for conference workshops
Stars: ✭ 19 (-86.81%)
Locopylocopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-49.31%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (-45.14%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1556.25%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-86.11%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (-12.5%)
NBiNBi is a testing framework (add-on to NUnit) for Business Intelligence and Data Access. The main goal of this framework is to let users create tests with a declarative approach based on an Xml syntax. By the means of NBi, you don't need to develop C# or Java code to specify your tests! Either, you don't need Visual Studio or Eclipse to compile y…
Stars: ✭ 102 (-29.17%)
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+93.75%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+137.5%)
datatileA library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+190.97%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-59.72%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-45.14%)
YuniqlFree and open source schema versioning and database migration made natively with .NET Core.
Stars: ✭ 156 (+8.33%)
SqlpadWeb-based SQL editor run in your own private cloud. Supports MySQL, Postgres, SQL Server, Vertica, Crate, ClickHouse, Trino, Presto, SAP HANA, Cassandra, Snowflake, BigQuery, SQLite, and more with ODBC
Stars: ✭ 4,113 (+2756.25%)
deordie-meetupsDE or DIE meetup made by data engineers for data engineers. Currently in Russian only.
Stars: ✭ 48 (-66.67%)
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (-46.53%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-63.19%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+339.58%)
blockchain-etl-streamingStreaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (-60.42%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+325%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+2472.92%)
wikirepoPython based Wikidata framework for easy dataframe extraction
Stars: ✭ 33 (-77.08%)
starlakeStarlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (-88.89%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (-67.36%)
awesome-bigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 11,093 (+7603.47%)
metamapperMetamapper is a data discovery and documentation platform for improving how teams understand and interact with their data.
Stars: ✭ 60 (-58.33%)
choriaFinally, an MMORPG that's all about grinding and doing chores.
Stars: ✭ 19 (-86.81%)
nim-gatabaseConnection-Pooling Compile-Time ORM for Nim
Stars: ✭ 103 (-28.47%)
cobrixA COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Stars: ✭ 109 (-24.31%)
exqliteAn SQLite3 driver for Elixir
Stars: ✭ 128 (-11.11%)
sqlite-spellfixLoadable spellfix1 extension for sqlite as python package
Stars: ✭ 13 (-90.97%)
singer-runnerA CLI and library to run Singer Taps and Targets
Stars: ✭ 33 (-77.08%)
sqlite-guiLightweight SQLite editor for Windows
Stars: ✭ 151 (+4.86%)
docker-sqlite3Sqlite3 command line in a docker container
Stars: ✭ 28 (-80.56%)
PDAP-ScrapersCode relating to scraping public police data.
Stars: ✭ 72 (-50%)