All Projects → sparklanes → Similar Projects or Alternatives

744 Open source projects that are alternatives of or similar to sparklanes

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (+47.06%)

Mutual labels: pipeline, etl, pyspark

lineage

Generate beautiful documentation for your data pipelines in markdown format

Stars: ✭ 16 (-5.88%)

Mutual labels: pipeline, etl, pyspark

Datavec

ETL Library for Machine Learning - data pipelines, data munging and wrangling

Stars: ✭ 272 (+1500%)

Mutual labels: pipeline, etl

Bulk Writer

Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.

Stars: ✭ 210 (+1135.29%)

Mutual labels: pipeline, etl

machine-learning-data-pipeline

Pipeline module for parallel real-time data processing for machine learning models development and production purposes.

Stars: ✭ 22 (+29.41%)

Mutual labels: data-preprocessing, data-processing

mydataharbor

🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步，主要定位是为实时交易系统服务，亦可用于大数据的数据同步（ETL领域）。

Stars: ✭ 28 (+64.71%)

Mutual labels: pipeline, etl

Go Streams

A lightweight stream processing library for Go

Stars: ✭ 615 (+3517.65%)

Mutual labels: pipeline, etl

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (+194.12%)

Mutual labels: pyspark, data-processing

prosto

Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby

Stars: ✭ 54 (+217.65%)

Mutual labels: data-preprocessing, data-processing

dropEst

Pipeline for initial analysis of droplet-based single-cell RNA-seq data

Stars: ✭ 71 (+317.65%)

Mutual labels: pipeline, preprocessing

Stetl

Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.

Stars: ✭ 64 (+276.47%)

Mutual labels: pipeline, etl

Forte

Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/

Stars: ✭ 89 (+423.53%)

Mutual labels: pipeline, data-processing

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+28835.29%)

Mutual labels: pipeline, etl

Metl

mito ETL tool

Stars: ✭ 153 (+800%)

Mutual labels: pipeline, etl

python mozetl

ETL jobs for Firefox Telemetry

Stars: ✭ 25 (+47.06%)

Mutual labels: etl, pyspark

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+3623.53%)

Mutual labels: etl, pyspark

skippa

SciKIt-learn Pipeline in PAndas

Stars: ✭ 33 (+94.12%)

Mutual labels: pipeline, preprocessing

etl

[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library

Stars: ✭ 279 (+1541.18%)

Mutual labels: etl, data-processing

naas

⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment

Stars: ✭ 219 (+1188.24%)

Mutual labels: pipeline, etl

Butterfree

A tool for building feature stores.

Stars: ✭ 126 (+641.18%)

Mutual labels: etl, pyspark

etl

M-Lab ingestion pipeline

Stars: ✭ 15 (-11.76%)

Mutual labels: pipeline, etl

Phila Airflow

Stars: ✭ 16 (-5.88%)

Mutual labels: pipeline, etl

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (+364.71%)

Mutual labels: pipeline, etl

Morphl Community Edition

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

Stars: ✭ 253 (+1388.24%)

Mutual labels: pipeline, pyspark

Mara Pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

Stars: ✭ 1,841 (+10729.41%)

Mutual labels: pipeline, etl

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+129.41%)

Mutual labels: etl, pyspark

SeqTools

A python library to manipulate and transform indexable data (lists, arrays, ...)

Stars: ✭ 42 (+147.06%)

Mutual labels: pipeline, preprocessing

oic-options-chains

ETL for OIC Options Chains

Stars: ✭ 22 (+29.41%)

Mutual labels: etl

GoEmotions-pytorch

Pytorch Implementation of GoEmotions 😍😢😱

Stars: ✭ 95 (+458.82%)

Mutual labels: pipeline

dflib

In-memory Java DataFrame library

Stars: ✭ 50 (+194.12%)

Mutual labels: etl

traceml

Engine for ML/Data tracking, visualization, dashboards, and model UI for Polyaxon.

Stars: ✭ 445 (+2517.65%)

Mutual labels: data-processing

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (+47.06%)

Mutual labels: pyspark

NGI-RNAseq

Nextflow RNA-Seq Best Practice analysis pipeline, used at the SciLifeLab National Genomics Infrastructure.

Stars: ✭ 50 (+194.12%)

Mutual labels: pipeline

versatile-data-kit

Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.

Stars: ✭ 144 (+747.06%)

Mutual labels: etl

get phylomarkers

A pipeline to select optimal markers for microbial phylogenomics and species tree estimation using coalescent and concatenation approaches

Stars: ✭ 34 (+100%)

Mutual labels: pipeline

emg-viral-pipeline

VIRify: detection of phages and eukaryotic viruses from metagenomic and metatranscriptomic assemblies

Stars: ✭ 38 (+123.53%)

Mutual labels: pipeline

gunpowder

A library to facilitate machine learning on multi-dimensional images.

Stars: ✭ 40 (+135.29%)

Mutual labels: pipeline

stargate

An Apache Pulsar client written in Elixir

Stars: ✭ 33 (+94.12%)

Mutual labels: data-processing

bacannot

Generic but comprehensive pipeline for prokaryotic genome annotation and interrogation with interactive reports and shiny app.

Stars: ✭ 51 (+200%)

Mutual labels: pipeline

Clojure Command-line Data Processor for JSON, YAML, EDN, XML and more

Stars: ✭ 111 (+552.94%)

Mutual labels: data-processing

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

Stars: ✭ 21 (+23.53%)

Mutual labels: etl

SynapseML

Simple and Distributed Machine Learning

Stars: ✭ 3,355 (+19635.29%)

Mutual labels: pyspark

kubecrypt

Helper for dealing with secrets in kubernetes.

Stars: ✭ 23 (+35.29%)

Mutual labels: pipeline

biojupies

Automated generation of tailored bioinformatics Jupyter Notebooks via a user interface.

Stars: ✭ 96 (+464.71%)

Mutual labels: pipeline

golang-docker-example

An example of how to run a Golang project in Docker in a Buildkite pipeline

Stars: ✭ 18 (+5.88%)

Mutual labels: pipeline

Speech-Recognition

End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow

Stars: ✭ 21 (+23.53%)

Mutual labels: data-processing

ruby-for-pentaho-kettle

Ruby scripting for pentaho-kettle

Stars: ✭ 42 (+147.06%)

Mutual labels: etl

EF-Migrations-Script-Generator-Task

No description or website provided.

Stars: ✭ 20 (+17.65%)

Mutual labels: pipeline

sagemaker-sparkml-serving-container

This code is used to build & run a Docker container for performing predictions against a Spark ML Pipeline.

Stars: ✭ 44 (+158.82%)

Mutual labels: pipeline

bump-everywhere

🚀 Automate versioning, changelog creation, README updates and GitHub releases using GitHub Actions,npm, docker or bash.

Stars: ✭ 24 (+41.18%)

Mutual labels: pipeline

howtheydevops

A curated collection of publicly available resources on how companies around the world practice DevOps

Stars: ✭ 318 (+1770.59%)

Mutual labels: pipeline

classification

Catalyst.Classification

Stars: ✭ 35 (+105.88%)

Mutual labels: pipeline

bonobo-sqlalchemy

PREVIEW - SQL databases in Bonobo, using sqlalchemy

Stars: ✭ 23 (+35.29%)

Mutual labels: data-processing

MLLabelUtils.jl

Utility package for working with classification targets and label-encodings

Stars: ✭ 30 (+76.47%)

Mutual labels: preprocessing

rnafusion

RNA-seq analysis pipeline for detection gene-fusions

Stars: ✭ 72 (+323.53%)

Mutual labels: pipeline

persistity

A persistence framework for game developers

Stars: ✭ 34 (+100%)

Mutual labels: etl

flow-platform-x

Continuous Integration Platform

Stars: ✭ 21 (+23.53%)

Mutual labels: pipeline

predict-fraud-using-auto-ai

Use AutoAI to detect fraud

Stars: ✭ 27 (+58.82%)

Mutual labels: pipeline

pipe-trait

Make it possible to chain regular functions

Stars: ✭ 22 (+29.41%)

Mutual labels: pipeline

PDAP-Scrapers

Code relating to scraping public police data.

Stars: ✭ 72 (+323.53%)

Mutual labels: etl

1-60 of 744 similar projects

›

next*5