All Projects → sparklanes → Similar Projects or Alternatives

744 Open source projects that are alternatives of or similar to sparklanes

basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (+47.06%)
Mutual labels:  pipeline, etl, pyspark
lineage
Generate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-5.88%)
Mutual labels:  pipeline, etl, pyspark
Datavec
ETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (+1500%)
Mutual labels:  pipeline, etl
Bulk Writer
Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (+1135.29%)
Mutual labels:  pipeline, etl
machine-learning-data-pipeline
Pipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (+29.41%)
mydataharbor
🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (+64.71%)
Mutual labels:  pipeline, etl
Go Streams
A lightweight stream processing library for Go
Stars: ✭ 615 (+3517.65%)
Mutual labels:  pipeline, etl
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (+194.12%)
Mutual labels:  pyspark, data-processing
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (+217.65%)
dropEst
Pipeline for initial analysis of droplet-based single-cell RNA-seq data
Stars: ✭ 71 (+317.65%)
Mutual labels:  pipeline, preprocessing
Stetl
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (+276.47%)
Mutual labels:  pipeline, etl
Forte
Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/
Stars: ✭ 89 (+423.53%)
Mutual labels:  pipeline, data-processing
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+28835.29%)
Mutual labels:  pipeline, etl
Metl
mito ETL tool
Stars: ✭ 153 (+800%)
Mutual labels:  pipeline, etl
python mozetl
ETL jobs for Firefox Telemetry
Stars: ✭ 25 (+47.06%)
Mutual labels:  etl, pyspark
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+3623.53%)
Mutual labels:  etl, pyspark
skippa
SciKIt-learn Pipeline in PAndas
Stars: ✭ 33 (+94.12%)
Mutual labels:  pipeline, preprocessing
etl
[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (+1541.18%)
Mutual labels:  etl, data-processing
naas
⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Stars: ✭ 219 (+1188.24%)
Mutual labels:  pipeline, etl
Butterfree
A tool for building feature stores.
Stars: ✭ 126 (+641.18%)
Mutual labels:  etl, pyspark
etl
M-Lab ingestion pipeline
Stars: ✭ 15 (-11.76%)
Mutual labels:  pipeline, etl
Phila Airflow
Stars: ✭ 16 (-5.88%)
Mutual labels:  pipeline, etl
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+364.71%)
Mutual labels:  pipeline, etl
Morphl Community Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Stars: ✭ 253 (+1388.24%)
Mutual labels:  pipeline, pyspark
Mara Pipelines
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (+10729.41%)
Mutual labels:  pipeline, etl
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+129.41%)
Mutual labels:  etl, pyspark
SeqTools
A python library to manipulate and transform indexable data (lists, arrays, ...)
Stars: ✭ 42 (+147.06%)
Mutual labels:  pipeline, preprocessing
oic-options-chains
ETL for OIC Options Chains
Stars: ✭ 22 (+29.41%)
Mutual labels:  etl
GoEmotions-pytorch
Pytorch Implementation of GoEmotions 😍😢😱
Stars: ✭ 95 (+458.82%)
Mutual labels:  pipeline
dflib
In-memory Java DataFrame library
Stars: ✭ 50 (+194.12%)
Mutual labels:  etl
traceml
Engine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.
Stars: ✭ 445 (+2517.65%)
Mutual labels:  data-processing
jobAnalytics and search
JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (+47.06%)
Mutual labels:  pyspark
NGI-RNAseq
Nextflow RNA-Seq Best Practice analysis pipeline, used at the SciLifeLab National Genomics Infrastructure.
Stars: ✭ 50 (+194.12%)
Mutual labels:  pipeline
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+747.06%)
Mutual labels:  etl
get phylomarkers
A pipeline to select optimal markers for microbial phylogenomics and species tree estimation using coalescent and concatenation approaches
Stars: ✭ 34 (+100%)
Mutual labels:  pipeline
emg-viral-pipeline
VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies
Stars: ✭ 38 (+123.53%)
Mutual labels:  pipeline
gunpowder
A library to facilitate machine learning on multi-dimensional images.
Stars: ✭ 40 (+135.29%)
Mutual labels:  pipeline
stargate
An Apache Pulsar client written in Elixir
Stars: ✭ 33 (+94.12%)
Mutual labels:  data-processing
bacannot
Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.
Stars: ✭ 51 (+200%)
Mutual labels:  pipeline
cq
Clojure Command-line Data Processor for JSON, YAML, EDN, XML and more
Stars: ✭ 111 (+552.94%)
Mutual labels:  data-processing
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (+23.53%)
Mutual labels:  etl
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+19635.29%)
Mutual labels:  pyspark
kubecrypt
Helper for dealing with secrets in kubernetes.
Stars: ✭ 23 (+35.29%)
Mutual labels:  pipeline
biojupies
Automated generation of tailored bioinformatics Jupyter Notebooks via a user interface.
Stars: ✭ 96 (+464.71%)
Mutual labels:  pipeline
golang-docker-example
An example of how to run a Golang project in Docker in a Buildkite pipeline
Stars: ✭ 18 (+5.88%)
Mutual labels:  pipeline
Speech-Recognition
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
Stars: ✭ 21 (+23.53%)
Mutual labels:  data-processing
ruby-for-pentaho-kettle
Ruby scripting for pentaho-kettle
Stars: ✭ 42 (+147.06%)
Mutual labels:  etl
EF-Migrations-Script-Generator-Task
No description or website provided.
Stars: ✭ 20 (+17.65%)
Mutual labels:  pipeline
sagemaker-sparkml-serving-container
This code is used to build & run a Docker container for performing predictions against a Spark ML Pipeline.
Stars: ✭ 44 (+158.82%)
Mutual labels:  pipeline
bump-everywhere
🚀 Automate versioning, changelog creation, README updates and GitHub releases using GitHub Actions,npm, docker or bash.
Stars: ✭ 24 (+41.18%)
Mutual labels:  pipeline
howtheydevops
A curated collection of publicly available resources on how companies around the world practice DevOps
Stars: ✭ 318 (+1770.59%)
Mutual labels:  pipeline
classification
Catalyst.Classification
Stars: ✭ 35 (+105.88%)
Mutual labels:  pipeline
bonobo-sqlalchemy
PREVIEW - SQL databases in Bonobo, using sqlalchemy
Stars: ✭ 23 (+35.29%)
Mutual labels:  data-processing
MLLabelUtils.jl
Utility package for working with classification targets and label-encodings
Stars: ✭ 30 (+76.47%)
Mutual labels:  preprocessing
rnafusion
RNA-seq analysis pipeline for detection gene-fusions
Stars: ✭ 72 (+323.53%)
Mutual labels:  pipeline
persistity
A persistence framework for game developers
Stars: ✭ 34 (+100%)
Mutual labels:  etl
flow-platform-x
Continuous Integration Platform
Stars: ✭ 21 (+23.53%)
Mutual labels:  pipeline
predict-fraud-using-auto-ai
Use AutoAI to detect fraud
Stars: ✭ 27 (+58.82%)
Mutual labels:  pipeline
pipe-trait
Make it possible to chain regular functions
Stars: ✭ 22 (+29.41%)
Mutual labels:  pipeline
PDAP-Scrapers
Code relating to scraping public police data.
Stars: ✭ 72 (+323.53%)
Mutual labels:  etl
1-60 of 744 similar projects