DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+5088.61%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-17.72%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+5698.73%)
Ether sqlA python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-48.1%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+9970.89%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+2918.99%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+701.27%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+332.91%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+59.49%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+0%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+53867.09%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+6126.58%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+2864.56%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-82.28%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+4589.87%)
Covid19 DashboardA site that displays up to date COVID-19 stats, powered by fastpages.
Stars: ✭ 1,212 (+1434.18%)
MinsqlHigh-performance log search engine.
Stars: ✭ 356 (+350.63%)
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (+358.23%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+356.96%)
Mlinterview A curated awesome list of AI Startups in India & Machine Learning Interview Guide. Feel free to contribute!
Stars: ✭ 410 (+418.99%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (-44.3%)
RoapiCreate full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (+220.25%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+82.28%)
CrateCrateDB is a distributed SQL database that makes it simple to store and analyze
massive amounts of data in real-time.
Stars: ✭ 3,254 (+4018.99%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+359.49%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+422.78%)
Ananas DesktopA hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Stars: ✭ 551 (+597.47%)
PreqlAn interpreted relational query language that compiles to SQL.
Stars: ✭ 257 (+225.32%)
VudashPowerful, Flexible, Open Source dashboards for anything
Stars: ✭ 363 (+359.49%)
DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (+394.94%)
Stats Maths With PythonGeneral statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (+382.28%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+674.68%)
Threatpursuit VmThreat Pursuit Virtual Machine (VM): A fully customizable, open-sourced Windows-based distribution focused on threat intelligence analysis and hunting designed for intel and malware analysts as well as threat hunters to get up and running quickly.
Stars: ✭ 814 (+930.38%)
Awesome StreamlitThe purpose of this project is to share knowledge on how awesome Streamlit is and can be
Stars: ✭ 769 (+873.42%)
Model Describermodel-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-72.15%)
WalkoffA flexible, easy to use, automation framework allowing users to integrate their capabilities and devices to cut through the repetitive, tedious tasks slowing them down. #nsacyber
Stars: ✭ 855 (+982.28%)
DatacleanerA Python tool that automatically cleans data sets and readies them for analysis.
Stars: ✭ 933 (+1081.01%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+993.67%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (+697.47%)
DatofutbolDato Fútbol repository
Stars: ✭ 23 (-70.89%)
Aws Auto Terminate Idle EmrAWS Auto Terminate Idle AWS EMR Clusters Framework is an AWS based solution using AWS CloudWatch and AWS Lambda using a Python script that is using Boto3 to terminate AWS EMR clusters that have been idle for a specified period of time.
Stars: ✭ 21 (-73.42%)
BloomThe simplest way to de-Google your life and business: Inbox, Calendar, Files, Contacts & much more
Stars: ✭ 934 (+1082.28%)
Ai PlatformAn open-source platform for automating tasks using machine learning models
Stars: ✭ 61 (-22.78%)
Locopylocopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-7.59%)
TpotA Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Stars: ✭ 8,378 (+10505.06%)
EventqlDistributed "massively parallel" SQL query engine
Stars: ✭ 1,121 (+1318.99%)