All Projects → Datavec → Similar Projects or Alternatives

1544 Open source projects that are alternatives of or similar to Datavec

Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser

Stars: ✭ 25 (-90.81%)

Mutual labels: spark, pipeline, etl

Stetl

Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.

Stars: ✭ 64 (-76.47%)

Mutual labels: pipeline, etl, transformations

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-70.96%)

Mutual labels: spark, pipeline, etl

Omniparser

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

Stars: ✭ 148 (-45.59%)

Mutual labels: schema, etl

mydataharbor

🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步，主要定位是为实时交易系统服务，亦可用于大数据的数据同步（ETL领域）。

Stars: ✭ 28 (-89.71%)

Mutual labels: pipeline, etl

naas

⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment

Stars: ✭ 219 (-19.49%)

Mutual labels: pipeline, etl

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+132.72%)

Mutual labels: spark, etl

Airbyte

Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.

Stars: ✭ 4,919 (+1708.46%)

Mutual labels: pipeline, etl

Spark Bigquery

Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.

Stars: ✭ 65 (-76.1%)

Mutual labels: schema, spark

Osom

An Awesome [/osom/] Object Data Modeling (Database Agnostic).

Stars: ✭ 68 (-75%)

Mutual labels: schema, transformations

Bulk Writer

Provides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.

Stars: ✭ 210 (-22.79%)

Mutual labels: pipeline, etl

etl

M-Lab ingestion pipeline

Stars: ✭ 15 (-94.49%)

Mutual labels: pipeline, etl

Dataspherestudio

DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.

Stars: ✭ 1,195 (+339.34%)

Mutual labels: spark, etl

Udacity Data Engineering

Udacity Data Engineering Nano Degree (DEND)

Stars: ✭ 89 (-67.28%)

Mutual labels: spark, etl

sparklanes

A lightweight data processing framework for Apache Spark

Stars: ✭ 17 (-93.75%)

Mutual labels: pipeline, etl

Go Streams

A lightweight stream processing library for Go

Stars: ✭ 615 (+126.1%)

Mutual labels: pipeline, etl

Phila Airflow

Stars: ✭ 16 (-94.12%)

Mutual labels: pipeline, etl

Wedatasphere

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+36.76%)

Mutual labels: spark, etl

Graphql Parser

A graphql query language and schema definition language parser and formatter for rust

Stars: ✭ 203 (-25.37%)

Mutual labels: schema, formatter

Mara Pipelines

A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow

Stars: ✭ 1,841 (+576.84%)

Mutual labels: pipeline, etl

Metl

mito ETL tool

Stars: ✭ 153 (-43.75%)

Mutual labels: pipeline, etl

Luigi Warehouse

A luigi powered analytics / warehouse stack

Stars: ✭ 72 (-73.53%)

Mutual labels: spark, etl

Metorikku

A simplified, lightweight ETL Framework based on Apache Spark

Stars: ✭ 361 (+32.72%)

Mutual labels: spark, etl

Transmogrifai

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

Stars: ✭ 2,084 (+666.18%)

Mutual labels: spark, transformations

lineage

Generate beautiful documentation for your data pipelines in markdown format

Stars: ✭ 16 (-94.12%)

Mutual labels: pipeline, etl

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (-87.5%)

Mutual labels: spark, transformations

Vue Form Generator

📋 A schema-based form generator component for Vue.js

Stars: ✭ 2,853 (+948.9%)

Mutual labels: schema

unimport

A linter, formatter for finding and removing unused import statements.

Stars: ✭ 119 (-56.25%)

Mutual labels: formatter

ploio

Safe, Reliable, and Fast Production Deployments for Kubernetes

Stars: ✭ 11 (-95.96%)

Mutual labels: pipeline

snakefmt

The uncompromising Snakemake code formatter

Stars: ✭ 78 (-71.32%)

Mutual labels: formatter

Typed Immutable

Immutable and structurally typed data

Stars: ✭ 263 (-3.31%)

Mutual labels: schema

Big Data Rosetta Code

Code snippets for solving common big data problems in various platforms. Inspired by Rosetta Code

Stars: ✭ 254 (-6.62%)

Mutual labels: spark

bandar-log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 20 (-92.65%)

Mutual labels: etl

latent-semantic-analysis

Pipeline for training LSA models using Scikit-Learn.

Stars: ✭ 20 (-92.65%)

Mutual labels: pipeline

kedro

A Python framework for creating reproducible, maintainable and modular data science code.

Stars: ✭ 6,068 (+2130.88%)

Mutual labels: pipeline

spark-http-stream

spark structured streaming via HTTP communication

Stars: ✭ 17 (-93.75%)

Mutual labels: spark

ddquery

Django Debug Query (ddquery) beautiful colored SQL statements for logging

Stars: ✭ 25 (-90.81%)

Mutual labels: formatter

Formvuelate

Dynamic schema-based form rendering for VueJS

Stars: ✭ 262 (-3.68%)

Mutual labels: schema

Helk

The Hunting ELK

Stars: ✭ 3,097 (+1038.6%)

Mutual labels: spark

spark-structured-streaming-examples

Spark structured streaming examples with using of version 3.0.0

Stars: ✭ 23 (-91.54%)

Mutual labels: spark

grate

A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.

Stars: ✭ 98 (-63.97%)

Mutual labels: etl

godot-exporter

Godot Engine Automation Pipeline Android – iOS – Linux – MacOS – Windows – HTML5 – Itch.io.

Stars: ✭ 54 (-80.15%)

Mutual labels: pipeline

laravel-spark-camera

Profile Photo Camera support for Laravel Spark

Stars: ✭ 30 (-88.97%)

Mutual labels: spark

fform

Flexibile and extendable form builder with constructor

Stars: ✭ 26 (-90.44%)

Mutual labels: schema

Dgsh

Shell supporting pipelines to and from multiple processes

Stars: ✭ 261 (-4.04%)

Mutual labels: pipeline

hammer

🛠 hammer is a command-line tool to schema management for Google Cloud Spanner.

Stars: ✭ 38 (-86.03%)

Mutual labels: schema

daf-kylo

Kylo integration with PDND (previously DAF).

Stars: ✭ 20 (-92.65%)

Mutual labels: spark

dllib

dllib is a distributed deep learning library running on Apache Spark

Stars: ✭ 32 (-88.24%)

Mutual labels: spark

pyrealtime

Realtime data processing and plotting pipelines in Python

Stars: ✭ 62 (-77.21%)

Mutual labels: pipeline

Spotify-Song-Recommendation-ML

UC Berkeley team's submission for RecSys Challenge 2018

Stars: ✭ 70 (-74.26%)

Mutual labels: spark

toml-sort

Toml sorting library

Stars: ✭ 31 (-88.6%)

Mutual labels: formatter

Seapig

🌊🐷 Utility for generalized composition of React components

Stars: ✭ 269 (-1.1%)

Mutual labels: schema

Phytouch

Smooth scrolling, rotation, pull to refresh, page transition and any motion for the web - 丝般顺滑的触摸运动方案

Stars: ✭ 2,854 (+949.26%)

Mutual labels: transformations

Docker Spark Cluster

A simple spark standalone cluster for your testing environment purposses

Stars: ✭ 261 (-4.04%)

Mutual labels: spark

currency edittext

Simple currency formatter for Android EditText

Stars: ✭ 64 (-76.47%)

Mutual labels: formatter

ctdna-pipeline

A simplified pipeline for ctDNA sequencing data analysis

Stars: ✭ 29 (-89.34%)

Mutual labels: pipeline

BlazorMonaco

Blazor component for Microsoft's Monaco Editor which powers Visual Studio Code.

Stars: ✭ 151 (-44.49%)

Mutual labels: formatter

etl manager

A python package to create a database on the platform using our moj data warehousing framework

Stars: ✭ 14 (-94.85%)

Mutual labels: etl

spark learning

尚硅谷大数据Spark-2019版最新 Spark 学习

Stars: ✭ 42 (-84.56%)

Mutual labels: spark

spark-data-sources

Developing Spark External Data Sources using the V2 API

Stars: ✭ 36 (-86.76%)

Mutual labels: spark

1-60 of 1544 similar projects

›

next*5