ksbg / sparklanes

Licence: MIT license

A lightweight data processing framework for Apache Spark

Programming Languages

python

139335 projects - #7 most used programming language

Jupyter Notebook

11667 projects

Projects that are alternatives of or similar to sparklanes

basin

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (+47.06%)

Mutual labels: pipeline, etl, pyspark

lineage

Generate beautiful documentation for your data pipelines in markdown format

Stars: ✭ 16 (-5.88%)

Mutual labels: pipeline, etl, pyspark

Mara Pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

Stars: ✭ 1,841 (+10729.41%)

Mutual labels: pipeline, etl

Metl

mito ETL tool

Stars: ✭ 153 (+800%)

Mutual labels: pipeline, etl

Morphl Community Edition

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

Stars: ✭ 253 (+1388.24%)

Mutual labels: pipeline, pyspark

Stetl

Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.

Stars: ✭ 64 (+276.47%)

Mutual labels: pipeline, etl

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (+364.71%)

Mutual labels: pipeline, etl

Bulk Writer

Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.

Stars: ✭ 210 (+1135.29%)

Mutual labels: pipeline, etl

machine-learning-data-pipeline

Pipeline module for parallel real-time data processing for machine learning models development and production purposes.

Stars: ✭ 22 (+29.41%)

Mutual labels: data-preprocessing, data-processing

naas

⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment

Stars: ✭ 219 (+1188.24%)

Mutual labels: pipeline, etl

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (+129.41%)

Mutual labels: etl, pyspark

Phila Airflow

Stars: ✭ 16 (-5.88%)

Mutual labels: pipeline, etl

Go Streams

A lightweight stream processing library for Go

Stars: ✭ 615 (+3517.65%)

Mutual labels: pipeline, etl

Forte

Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/

Stars: ✭ 89 (+423.53%)

Mutual labels: pipeline, data-processing

Datavec

ETL Library for Machine Learning - data pipelines, data munging and wrangling

Stars: ✭ 272 (+1500%)

Mutual labels: pipeline, etl

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+28835.29%)

Mutual labels: pipeline, etl

SeqTools

A python library to manipulate and transform indexable data (lists, arrays, ...)

Stars: ✭ 42 (+147.06%)

Mutual labels: pipeline, preprocessing

skippa

SciKIt-learn Pipeline in PAndas

Stars: ✭ 33 (+94.12%)

Mutual labels: pipeline, preprocessing

dropEst

Pipeline for initial analysis of droplet-based single-cell RNA-seq data

Stars: ✭ 71 (+317.65%)

Mutual labels: pipeline, preprocessing

etl

[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library

Stars: ✭ 279 (+1541.18%)

Mutual labels: etl, data-processing

View All Similar Projects ➔

sparklanes

sparklanes is a lightweight data processing framework for Apache Spark written in Python. It was built with the intention to make building complex spark processing pipelines simpler, by shifting the focus towards writing data processing code without having to spent much time on the surrounding application architecture.

Data processing pipelines, or lanes, are built by stringing together encapsulated processor classes, which allows creation of lane definitions with an arbitrary processor order, where processors can be easily removed, added or swapped.

Processing pipelines can be defined using lane configuration YAML files, to then be packaged and submitted to spark using a single command. Alternatively, the same can be achieved manually by using the framework's API.

Usage

Check out the documentation at sparklanes.readthedocs.io, as well as the example Jupyter notebook

Installation

Using pip:

pip install sparklanes

Tests & Docs

Install the development requirements:

pip install -r requirements-dev.txt

Run the test suite from the project root using:

python -m tests

Build the documentation:

cd docs && make html

Disclaimer

I don't recommend using this in production, as I'm not actively maintaining it.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ksbg / sparklanes

Programming Languages

Labels

Projects that are alternatives of or similar to sparklanes

sparklanes

Usage

Installation

Tests & Docs

Disclaimer