All Projects → lineage → Similar Projects or Alternatives

736 Open source projects that are alternatives of or similar to lineage

basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (+56.25%)
Mutual labels:  pipeline, etl, pyspark
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (+6.25%)
Mutual labels:  pipeline, etl, pyspark
Metl
mito ETL tool
Stars: ✭ 153 (+856.25%)
Mutual labels:  pipeline, etl
mydataharbor
🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (+75%)
Mutual labels:  pipeline, etl
Go Streams
A lightweight stream processing library for Go
Stars: ✭ 615 (+3743.75%)
Mutual labels:  pipeline, etl
Bulk Writer
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (+1212.5%)
Mutual labels:  pipeline, etl
python mozetl
ETL jobs for Firefox Telemetry
Stars: ✭ 25 (+56.25%)
Mutual labels:  etl, pyspark
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+393.75%)
Mutual labels:  pipeline, etl
Morphl Community Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Stars: ✭ 253 (+1481.25%)
Mutual labels:  pipeline, pyspark
Example Airflow Dags
Example DAGs using hooks and operators from Airflow Plugins
Stars: ✭ 243 (+1418.75%)
Mutual labels:  etl, dag
Serving
A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)
Stars: ✭ 403 (+2418.75%)
Mutual labels:  pipeline, dag
Phila Airflow
Stars: ✭ 16 (+0%)
Mutual labels:  pipeline, etl
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+30643.75%)
Mutual labels:  pipeline, etl
etl
M-Lab ingestion pipeline
Stars: ✭ 15 (-6.25%)
Mutual labels:  pipeline, etl
Datavec
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+1600%)
Mutual labels:  pipeline, etl
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (+687.5%)
Mutual labels:  etl, pyspark
naas
⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Stars: ✭ 219 (+1268.75%)
Mutual labels:  pipeline, etl
Mara Pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (+11406.25%)
Mutual labels:  pipeline, etl
Aws Ecs Airflow
Run Airflow in AWS ECS(Elastic Container Service) using Fargate tasks
Stars: ✭ 107 (+568.75%)
Mutual labels:  etl, dag
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+3856.25%)
Mutual labels:  etl, pyspark
Stetl
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (+300%)
Mutual labels:  pipeline, etl
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+143.75%)
Mutual labels:  etl, pyspark
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+3725%)
Mutual labels:  etl, dag
cobra-policytool
Manage Apache Atlas and Ranger configuration for your Hadoop environment.
Stars: ✭ 16 (+0%)
Mutual labels:  atlas
dag
Simple DSL for executing functions in Go
Stars: ✭ 85 (+431.25%)
Mutual labels:  dag
bump-everywhere
🚀 Automate versioning, changelog creation, README updates and GitHub releases using GitHub Actions,npm, docker or bash.
Stars: ✭ 24 (+50%)
Mutual labels:  pipeline
kubecrypt
Helper for dealing with secrets in kubernetes.
Stars: ✭ 23 (+43.75%)
Mutual labels:  pipeline
pipe-trait
Make it possible to chain regular functions
Stars: ✭ 22 (+37.5%)
Mutual labels:  pipeline
ruby-for-pentaho-kettle
Ruby scripting for pentaho-kettle
Stars: ✭ 42 (+162.5%)
Mutual labels:  etl
textureatlas
A simple, cross-platform Python-based tool and C library for creating and using a texture atlas in your application or game. Distributed under the terms of the MIT license.
Stars: ✭ 20 (+25%)
Mutual labels:  atlas
persistity
A persistence framework for game developers
Stars: ✭ 34 (+112.5%)
Mutual labels:  etl
flamingo
FreeCAD - flamingo workbench
Stars: ✭ 30 (+87.5%)
Mutual labels:  pipeline
dnaPipeTE
dnaPipeTE (for de-novo assembly & annotation Pipeline for Transposable Elements), is a pipeline designed to find, annotate and quantify Transposable Elements in small samples of NGS datasets. It is very useful to quantify the proportion of TEs in newly sequenced genomes since it does not require genome assembly and works on small datasets (< 1X).
Stars: ✭ 28 (+75%)
Mutual labels:  pipeline
Spark-for-data-engineers
Apache Spark for data engineers
Stars: ✭ 22 (+37.5%)
Mutual labels:  pyspark
katana-skipper
Simple and flexible ML workflow engine
Stars: ✭ 234 (+1362.5%)
Mutual labels:  pipeline
kafka-connect-datagen
A Kafka Connect source connector that generates data for tests
Stars: ✭ 27 (+68.75%)
Mutual labels:  etl
dswarm
an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 57 (+256.25%)
Mutual labels:  etl
check-engine
Data validation library for PySpark 3.0.0
Stars: ✭ 29 (+81.25%)
Mutual labels:  pyspark
bacannot
Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
Stars: ✭ 51 (+218.75%)
Mutual labels:  pipeline
gallia-core
A schema-aware Scala library for data transformation
Stars: ✭ 44 (+175%)
Mutual labels:  etl
nwabap-ui5uploader
This module allows a developer to upload SAPUI5/OpenUI5 sources into a SAP NetWeaver ABAP system.
Stars: ✭ 15 (-6.25%)
Mutual labels:  pipeline
KoELECTRA-Pipeline
Transformers Pipeline with KoELECTRA
Stars: ✭ 37 (+131.25%)
Mutual labels:  pipeline
hlatyping
Precision HLA typing from next-generation sequencing data
Stars: ✭ 28 (+75%)
Mutual labels:  pipeline
go-pdu
Parallel Digital Universe - A decentralized social networking service
Stars: ✭ 39 (+143.75%)
Mutual labels:  dag
google classroom
Google Classroom Data Pipeline
Stars: ✭ 17 (+6.25%)
Mutual labels:  pipeline
Atlas auto setline
a tool for automatic offline/online unusable slave node in Atlas open source software
Stars: ✭ 47 (+193.75%)
Mutual labels:  atlas
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-12.5%)
Mutual labels:  etl
oic-options-chains
ETL for OIC Options Chains
Stars: ✭ 22 (+37.5%)
Mutual labels:  etl
DataEngineering
This repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (+193.75%)
Mutual labels:  pyspark
MTBseq source
MTBseq is an automated pipeline for mapping, variant calling and detection of resistance mediating and phylogenetic variants from illumina whole genome sequence data of Mycobacterium tuberculosis complex isolates.
Stars: ✭ 26 (+62.5%)
Mutual labels:  pipeline
taxid-changelog
NCBI taxonomic identifier (taxid) changelog, including taxids deletion, new adding, merge, reuse, and rank/name changes.
Stars: ✭ 13 (-18.75%)
Mutual labels:  lineage
rivery cli
Rivery CLI
Stars: ✭ 16 (+0%)
Mutual labels:  etl
dflib
In-memory Java DataFrame library
Stars: ✭ 50 (+212.5%)
Mutual labels:  etl
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+800%)
Mutual labels:  etl
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (+56.25%)
Mutual labels:  pyspark
get phylomarkers
A pipeline to select optimal markers for microbial phylogenomics and species tree estimation using coalescent and concatenation approaches
Stars: ✭ 34 (+112.5%)
Mutual labels:  pipeline
gunpowder
A library to facilitate machine learning on multi-dimensional images.
Stars: ✭ 40 (+150%)
Mutual labels:  pipeline
swarmci
Swarm CI - Docker Swarm-based CI system or enhancement to existing systems.
Stars: ✭ 48 (+200%)
Mutual labels:  pipeline
rnafusion
RNA-seq analysis pipeline for detection gene-fusions
Stars: ✭ 72 (+350%)
Mutual labels:  pipeline
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+20868.75%)
Mutual labels:  pyspark
1-60 of 736 similar projects