DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (+3454.55%)
mqtt-to-kafka-bridgeMove your messages from MQTT to Apache Kafka in real-time 🚀
Stars: ✭ 21 (+90.91%)
dflibIn-memory Java DataFrame library
Stars: ✭ 50 (+354.55%)
Go StreamsA lightweight stream processing library for Go
Stars: ✭ 615 (+5490.91%)
mydataharbor🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (+154.55%)
DataBridge.NETConfigurable data bridge for permanent ETL jobs
Stars: ✭ 16 (+45.45%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+3281.82%)
cobrixA COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Stars: ✭ 109 (+890.91%)
cpp-can-isotpC++ implementation of CAN ISO 15765-2 also known as CAN ISO transport protocol. CPP CAN isotp.
Stars: ✭ 14 (+27.27%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (+118.18%)
Getting StartedThis repository is a getting started guide to Singer.
Stars: ✭ 734 (+6572.73%)
mikThe Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
Stars: ✭ 32 (+190.91%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+3281.82%)
singer-runnerA CLI and library to run Singer Taps and Targets
Stars: ✭ 33 (+200%)
sql-to-redis🔄 Simple tool for ETL. From SQL to Redis.
Stars: ✭ 18 (+63.64%)
Ananas DesktopA hackable data integration & analysis tool to enable non technical users to edit data processing jobs and visualise data on demand.
Stars: ✭ 551 (+4909.09%)
DQCS数据质量控制系统
Stars: ✭ 34 (+209.09%)
gamechanger-dataGAMECHANGER aspires to be the Department’s trusted solution for evidence-based, data-driven decision-making across the universe of DoD requirements
Stars: ✭ 17 (+54.55%)
covid-19Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (+27.27%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (+3009.09%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+509.09%)
cardano-pyPython3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)
Stars: ✭ 17 (+54.55%)
CVparserCVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (+154.55%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (+72.73%)
django-calaccess-raw-dataA Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Stars: ✭ 61 (+454.55%)
mlbgamedayMulti-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
Stars: ✭ 37 (+236.36%)
sync-engine-exampleSynchronization Algorithm Exploration: Techniques to synchronize a SQL database with external destinations.
Stars: ✭ 17 (+54.55%)
SmooksAn extensible Java framework for building XML and non-XML streaming applications
Stars: ✭ 293 (+2563.64%)
starlakeStarlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (+45.45%)
openrefine-clientThe OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Stars: ✭ 67 (+509.09%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+254.55%)
BigsliceA serverless cluster computing system for the Go programming language
Stars: ✭ 469 (+4163.64%)
zinggScalable identity resolution, entity resolution, data mastering and deduplication using ML
Stars: ✭ 655 (+5854.55%)
carryPython ETL(Extract-Transform-Load) tool / Data migration tool
Stars: ✭ 115 (+945.45%)
csv-cruncherTreats CSV and JSON files as SQL tables, and exports SQL SELECTs back to CSV or JSON.
Stars: ✭ 32 (+190.91%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+33581.82%)
morph-kgcPowerful RDF Knowledge Graph Generation with [R2]RML Mappings
Stars: ✭ 77 (+600%)
es2postgresElasticSearch to PostgreSQL loader
Stars: ✭ 18 (+63.64%)
google-sheets-etlLive import all your Google Sheets to your data warehouse
Stars: ✭ 15 (+36.36%)
React CsvReact components to build CSV files on the fly basing on Array/literal object of data
Stars: ✭ 732 (+6554.55%)
BETL-oldBETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (+54.55%)
astroAstro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+618.18%)
openrefine-batchShell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
Stars: ✭ 76 (+590.91%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (+27.27%)
iex-stocksETL for the IEX Stocks API
Stars: ✭ 19 (+72.73%)
DataXServer为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer) 分布式运行(DataX on YARN) 功能
Stars: ✭ 130 (+1081.82%)
neo4j-jdbcJDBC driver for Neo4j
Stars: ✭ 110 (+900%)
EtlalchemyExtract, Transform, Load: Any SQL Database in 4 lines of Code.
Stars: ✭ 460 (+4081.82%)
kafka-connect-datagenA Kafka Connect source connector that generates data for tests
Stars: ✭ 27 (+145.45%)
Tuna🐟 A streaming ETL for fish
Stars: ✭ 11 (+0%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+5654.55%)
PglogicalLogical Replication extension for PostgreSQL 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Stars: ✭ 455 (+4036.36%)
grateA Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.
Stars: ✭ 98 (+790.91%)