basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-90.81%)
StetlStetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Stars: ✭ 64 (-76.47%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-70.96%)
Omniparseromniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Stars: ✭ 148 (-45.59%)
mydataharbor🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (-89.71%)
naas⚙️ Schedule notebooks, run them like APIs, expose securely your assets: Jupyter as a viable ⚡️ Production environment
Stars: ✭ 219 (-19.49%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+132.72%)
AirbyteAirbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+1708.46%)
Spark BigqueryGoogle BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Stars: ✭ 65 (-76.1%)
OsomAn Awesome [/osom/] Object Data Modeling (Database Agnostic).
Stars: ✭ 68 (-75%)
Bulk WriterProvides guidance for fast ETL jobs, an IDataReader implementation for SqlBulkCopy (or the MySql or Oracle equivalents) that wraps an IEnumerable, and libraries for mapping entites to table columns.
Stars: ✭ 210 (-22.79%)
etlM-Lab ingestion pipeline
Stars: ✭ 15 (-94.49%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+339.34%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-93.75%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+126.1%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+36.76%)
Graphql ParserA graphql query language and schema definition language parser and formatter for rust
Stars: ✭ 203 (-25.37%)
Mara PipelinesA lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Stars: ✭ 1,841 (+576.84%)
Metlmito ETL tool
Stars: ✭ 153 (-43.75%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-73.53%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+32.72%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+666.18%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-94.12%)
Vue Form Generator📋 A schema-based form generator component for Vue.js
Stars: ✭ 2,853 (+948.9%)
unimportA linter, formatter for finding and removing unused import statements.
Stars: ✭ 119 (-56.25%)
ploioSafe, Reliable, and Fast Production Deployments for Kubernetes
Stars: ✭ 11 (-95.96%)
snakefmtThe uncompromising Snakemake code formatter
Stars: ✭ 78 (-71.32%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (-6.62%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-92.65%)
kedroA Python framework for creating reproducible, maintainable and modular data science code.
Stars: ✭ 6,068 (+2130.88%)
spark-http-streamspark structured streaming via HTTP communication
Stars: ✭ 17 (-93.75%)
ddqueryDjango Debug Query (ddquery) beautiful colored SQL statements for logging
Stars: ✭ 25 (-90.81%)
FormvuelateDynamic schema-based form rendering for VueJS
Stars: ✭ 262 (-3.68%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+1038.6%)
grateA Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.
Stars: ✭ 98 (-63.97%)
godot-exporterGodot Engine Automation Pipeline Android – iOS – Linux – MacOS – Windows – HTML5 – Itch.io.
Stars: ✭ 54 (-80.15%)
fformFlexibile and extendable form builder with constructor
Stars: ✭ 26 (-90.44%)
DgshShell supporting pipelines to and from multiple processes
Stars: ✭ 261 (-4.04%)
hammer🛠 hammer is a command-line tool to schema management for Google Cloud Spanner.
Stars: ✭ 38 (-86.03%)
daf-kyloKylo integration with PDND (previously DAF).
Stars: ✭ 20 (-92.65%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-88.24%)
pyrealtimeRealtime data processing and plotting pipelines in Python
Stars: ✭ 62 (-77.21%)
toml-sortToml sorting library
Stars: ✭ 31 (-88.6%)
Seapig 🌊🐷 Utility for generalized composition of React components
Stars: ✭ 269 (-1.1%)
PhytouchSmooth scrolling, rotation, pull to refresh, page transition and any motion for the web - 丝般顺滑的触摸运动方案
Stars: ✭ 2,854 (+949.26%)
Docker Spark ClusterA simple spark standalone cluster for your testing environment purposses
Stars: ✭ 261 (-4.04%)
currency edittextSimple currency formatter for Android EditText
Stars: ✭ 64 (-76.47%)
ctdna-pipelineA simplified pipeline for ctDNA sequencing data analysis
Stars: ✭ 29 (-89.34%)
BlazorMonacoBlazor component for Microsoft's Monaco Editor which powers Visual Studio Code.
Stars: ✭ 151 (-44.49%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-94.85%)
spark-data-sourcesDeveloping Spark External Data Sources using the V2 API
Stars: ✭ 36 (-86.76%)