basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (+47.06%)
Mutual labels: pipeline, etl, pyspark
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-5.88%)
Mutual labels: pipeline, etl, pyspark
Mara PipelinesA lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (+10729.41%)
Mutual labels: pipeline, etl
Metlmito ETL tool
Stars: ✭ 153 (+800%)
Mutual labels: pipeline, etl
Morphl Community EditionMorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Stars: ✭ 253 (+1388.24%)
Mutual labels: pipeline, pyspark
StetlStetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (+276.47%)
Mutual labels: pipeline, etl
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+364.71%)
Mutual labels: pipeline, etl
Bulk WriterProvides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (+1135.29%)
Mutual labels: pipeline, etl
machine-learning-data-pipelinePipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (+29.41%)
Mutual labels: data-preprocessing, data-processing
naas⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Stars: ✭ 219 (+1188.24%)
Mutual labels: pipeline, etl
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+129.41%)
Mutual labels: etl, pyspark
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+3517.65%)
Mutual labels: pipeline, etl
ForteForte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 89 (+423.53%)
Mutual labels: pipeline, data-processing
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+1500%)
Mutual labels: pipeline, etl
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+28835.29%)
Mutual labels: pipeline, etl
SeqToolsA python library to manipulate and transform indexable data (lists, arrays, ...)
Stars: ✭ 42 (+147.06%)
Mutual labels: pipeline, preprocessing
skippaSciKIt-learn Pipeline in PAndas
Stars: ✭ 33 (+94.12%)
Mutual labels: pipeline, preprocessing
dropEstPipeline for initial analysis of droplet-based single-cell RNA-seq data
Stars: ✭ 71 (+317.65%)
Mutual labels: pipeline, preprocessing
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+1541.18%)
Mutual labels: etl, data-processing