datajobBuild and deploy a serverless data pipeline on AWS with no effort.
Stars: ✭ 101 (+94.23%)
AirflowETLBlog post on ETL pipelines with Airflow
Stars: ✭ 20 (-61.54%)
aws-pdf-textract-pipeline🔍 Data pipeline for crawling PDFs from the Web and transforming their contents into structured data using AWS textract. Built with AWS CDK + TypeScript
Stars: ✭ 141 (+171.15%)
scicloj.mlA Clojure machine learning library
Stars: ✭ 152 (+192.31%)
React Native Firebase🔥 A well-tested feature-rich modular Firebase implementation for React Native. Supports both iOS & Android platforms for all Firebase services.
Stars: ✭ 9,674 (+18503.85%)
Data Engineering HowtoA list of useful resources to learn Data Engineering from scratch
Stars: ✭ 2,056 (+3853.85%)
SnowplowThe enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP
Stars: ✭ 5,935 (+11313.46%)
trembitaModel complex data transformation pipelines easily
Stars: ✭ 44 (-15.38%)
pipelineOONI data processing pipeline
Stars: ✭ 36 (-30.77%)
network-pipelineNetwork traffic data pipeline for real-time predictions and building datasets for deep neural networks
Stars: ✭ 36 (-30.77%)
augraphyAugmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Stars: ✭ 49 (-5.77%)
richflowA Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.
Stars: ✭ 17 (-67.31%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-51.92%)
ATOMAutomated Tool for Optimized Modelling
Stars: ✭ 85 (+63.46%)
opentrials-airflowConfiguration and definitions of Airflow for OpenTrials
Stars: ✭ 18 (-65.38%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-25%)
machine-learning-data-pipelinePipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (-57.69%)
ob bulkstashBulk Stash is a docker rclone service to sync, or copy, files between different storage services. For example, you can copy files either to or from a remote storage services like Amazon S3 to Google Cloud Storage, or locally from your laptop to a remote storage.
Stars: ✭ 113 (+117.31%)
saisokuSaisoku is a Python module that helps you build complex pipelines of batch file/directory transfer/sync jobs.
Stars: ✭ 40 (-23.08%)