versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+121.54%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+6206.15%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+426.15%)
SaynData processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Stars: ✭ 79 (+21.54%)
neon-workshopA Pachyderm deep learning tutorial for conference workshops
Stars: ✭ 19 (-70.77%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+873.85%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (-32.31%)
GpdbGreenplum Database - Massively Parallel PostgreSQL for Analytics. An open-source massively parallel data platform for analytics, machine learning and AI.
Stars: ✭ 4,928 (+7481.54%)
HubDataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+6058.46%)
noronhaDataOps framework for Machine Learning projects.
Stars: ✭ 47 (-27.69%)
Ananas DesktopA hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Stars: ✭ 551 (+747.69%)
Reddit DetectivePlay detective on Reddit: Discover political disinformation campaigns, secret influencers and more
Stars: ✭ 129 (+98.46%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+3569.23%)
Ether sqlA python library to push ethereum blockchain data into an sql database.
Stars: ✭ 41 (-36.92%)
ButterfreeA tool for building feature stores.
Stars: ✭ 126 (+93.85%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+7467.69%)
AthenaxSQL-based streaming analytics platform at scale
Stars: ✭ 1,178 (+1712.31%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+2547.69%)
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+18.46%)
firehoseFirehose is an extensible, no-code, and cloud-native service to load real-time streaming data from Kafka to data stores, data lakes, and analytical storage systems.
Stars: ✭ 213 (+227.69%)
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+329.23%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-18.46%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (-27.69%)
blockchain-etl-streamingStreaming Ethereum and Bitcoin blockchain data to Google Pub/Sub or Postgres in Kubernetes
Stars: ✭ 57 (-12.31%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+3863.08%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+65490.77%)
google-sheets-etlLive import all your Google Sheets to your data warehouse
Stars: ✭ 15 (-76.92%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+5600%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-78.46%)
datatileA library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+544.62%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+21.54%)
data-science-best-practicesThe goal of this repository is to enable data scientists and ML engineers to develop data science use cases and making it ready for production use. This means focusing on the versioning, scalability, monitoring and engineering of the solution.
Stars: ✭ 53 (-18.46%)
cliPolyaxon Core Client & CLI to streamline MLOps
Stars: ✭ 18 (-72.31%)
EventqlDistributed "massively parallel" SQL query engine
Stars: ✭ 1,121 (+1624.62%)
Uplot📈 A small, fast chart for time series, lines, areas, ohlc & bars
Stars: ✭ 6,808 (+10373.85%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-69.23%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+689.23%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (+3503.08%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+841.54%)
ml-in-productionThe practical use-cases of how to make your Machine Learning Pipelines robust and reliable using Apache Airflow.
Stars: ✭ 29 (-55.38%)
m3u8Parse and generate m3u8 playlists for Apple HTTP Live Streaming (HLS) in Ruby.
Stars: ✭ 96 (+47.69%)
WinAnalyticsA light-weight android library that can be quickly integrated into any app to use analytics tools.
Stars: ✭ 23 (-64.62%)
mlops-with-vertex-aiAn end-to-end example of MLOps on Google Cloud using TensorFlow, TFX, and Vertex AI
Stars: ✭ 155 (+138.46%)
mailtrapMailTrap has been renamed to Sendria. Please use Sendria now, MailTrap is abandoned. MailTrap is a SMTP server designed to run in your dev/test environment, that is designed to catch any email you or your application is sending, and display it in a web interface instead of sending to real world.
Stars: ✭ 14 (-78.46%)
transform-hubFlexible and efficient data processing engine and an evolution of the popular Scramjet Framework based on node.js. Our Transform Hub was designed specifically for data processing and has its own unique algorithms included.
Stars: ✭ 38 (-41.54%)
PHP-Broadcast-radio🌈 Autonomous streaming audio ,serveronline internet radio is free streaming music for your listening pleasure, as well as news and announcements.
Stars: ✭ 38 (-41.54%)