All Categories → No Category → data-pipeline

Top 23 data-pipeline open source projects

Data Engineering Howto
A list of useful resources to learn Data Engineering from scratch
Snowplow
The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
serverless-data-pipeline-sam
Serverless Data Pipeline powered by Kinesis Firehose, API Gateway, Lambda, S3, and Athena
richflow
A Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
ATOM
Automated Tool for Optimized Modelling
opentrials-airflow
Configuration and definitions of Airflow for OpenTrials
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
machine-learning-data-pipeline
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
ob bulkstash
Bulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
aws-pdf-textract-pipeline
🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
1-23 of 23 data-pipeline projects