DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-14.29%)
Pyetlpython ETL framework
Stars: ✭ 33 (+17.86%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+107.14%)
ChoetlETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+1228.57%)
cubetlCubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-25%)
RecordParserZero Allocation Writer/Reader Parser for .NET Core
Stars: ✭ 155 (+453.57%)
StoragetapperStorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (+728.57%)
Tsv UtilseBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Stars: ✭ 1,215 (+4239.29%)
Data CuratorData Curator - share usable open data
Stars: ✭ 199 (+610.71%)
Pxi🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
Stars: ✭ 248 (+785.71%)
Pyexcel IoOne interface to read and write the data in various excel formats, import the data into and export the data from databases
Stars: ✭ 40 (+42.86%)
Intellij Csv ValidatorCSV validator, highlighter and formatter plugin for JetBrains Intellij IDEA, PyCharm, WebStorm, ...
Stars: ✭ 198 (+607.14%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (+246.43%)
DIRECTDIRECT, the Data Integration Run-time Execution Control Tool, is a data logistics framework that can be used to monitor, log, audit and control data integration / ETL processes.
Stars: ✭ 20 (-28.57%)
krawlerA minimalist (geospatial) ETL
Stars: ✭ 51 (+82.14%)
link-moveA model-driven dynamically-configurable framework to acquire data from external sources and save it to your database.
Stars: ✭ 32 (+14.29%)
tamerStandalone alternatives to Kafka Connect Connectors
Stars: ✭ 42 (+50%)
MillerMiller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Stars: ✭ 4,633 (+16446.43%)
BETL-oldBETL. Meta data driven ETL generation using T-SQL
Stars: ✭ 17 (-39.29%)
athena-sqliteA SQLite driver for S3 and Amazon Athena 😳
Stars: ✭ 82 (+192.86%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-28.57%)
YouPlotA command line tool that draw plots on the terminal.
Stars: ✭ 412 (+1371.43%)
dogETLA lib to transform data from jdbc,csv,json to ecah other.
Stars: ✭ 15 (-46.43%)
DataProfilerWhat's in your data? Extract schema, statistics and entities from datasets
Stars: ✭ 843 (+2910.71%)
Structured Text ToolsA list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+21971.43%)
SqlitebiterA CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.
Stars: ✭ 601 (+2046.43%)
Qq - Run SQL directly on CSV or TSV files
Stars: ✭ 8,809 (+31360.71%)
CsvtkA cross-platform, efficient and practical CSV/TSV toolkit in Golang
Stars: ✭ 566 (+1921.43%)
WinmergeWinMerge is an Open Source differencing and merging tool for Windows. WinMerge can compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
Stars: ✭ 2,358 (+8321.43%)
Rbql🦜RBQL - Rainbow Query Language: SQL-like language for (not only) CSV file processing. Supports SQL queries with Python and JavaScript expressions
Stars: ✭ 118 (+321.43%)
dbddbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (+7.14%)
SwiftcsvCSV parser for Swift
Stars: ✭ 511 (+1725%)
Topos🌀 .NET Event Processing library
Stars: ✭ 22 (-21.43%)
DataX-srcDataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-25%)
vixtractwww.vixtract.ru
Stars: ✭ 40 (+42.86%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+39.29%)
athenadriverA fully-featured AWS Athena database driver (+ athenareader https://github.com/uber/athenadriver/tree/master/athenareader)
Stars: ✭ 116 (+314.29%)
VroomFast reading of delimited files
Stars: ✭ 462 (+1550%)
Kafka-quickstartKafka Examples focusing on Producer, Consumer, KStreams, KTable, Global KTable using Spring, Kafka Cluster Setup & Monitoring. Implementing Event Sourcing and CQRS Design Pattern using Kafka
Stars: ✭ 31 (+10.71%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+2085.71%)
tabular-streamDetects tabular data (spreadsheets, dsv or json, 20+ different formats) and emits normalized objects.
Stars: ✭ 34 (+21.43%)
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (+53.57%)
DataBridge.NETConfigurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-42.86%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+35.71%)
stock-market-scraperScraps historical stock market data from Yahoo Finance (https://finance.yahoo.com/)
Stars: ✭ 110 (+292.86%)
kafka-0.11-examplesCode snippets that demonstrate how to leverage the new Kafka 0.11 APIs
Stars: ✭ 17 (-39.29%)
Rhythm-CB-ScriptsCollection of scripts for use with Carbon Black Cb Response API
Stars: ✭ 14 (-50%)
OpenKettleWebUI一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+392.86%)
dswarman open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 57 (+103.57%)
ETL-Starter-Kit📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
Stars: ✭ 21 (-25%)
TILToday I Learned
Stars: ✭ 43 (+53.57%)
InsulatorA client UI to inspect Kafka topics, consume, produce and much more
Stars: ✭ 53 (+89.29%)
VisidataA terminal spreadsheet multitool for discovering and arranging data
Stars: ✭ 4,606 (+16350%)
Pytablewriterpytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (+1407.14%)