openrefine-clientThe OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Stars: ✭ 67 (+252.63%)
openrefine-batchShell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
Stars: ✭ 76 (+300%)
wrangleA data transformation package for deep learning with Autonomio, Keras and TensorFlow.
Stars: ✭ 15 (-21.05%)
persistityA persistence framework for game developers
Stars: ✭ 34 (+78.95%)
sql-to-redis🔄 Simple tool for ETL. From SQL to Redis.
Stars: ✭ 18 (-5.26%)
cobrixA COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Stars: ✭ 109 (+473.68%)
Library-Search-Plugin-PublicThe Library Search Plugin plugin allows users (students, researchers, etc.) to search your library's catalogue, Google Scholar, WorldCat, or PubMed, without having to navigate to the respective websites first! It also comes with a neat context menu that allows users to select text, right-click, and search!
Stars: ✭ 17 (-10.53%)
oesophagusEnterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)
Stars: ✭ 12 (-36.84%)
covid-19Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-26.32%)
metis-frameworkMetis, named after the Titaness of Wisdom, is our in-development data publication framework including both a client application and a number of data processing (micro)services
Stars: ✭ 15 (-21.05%)
DataBridge.NETConfigurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-15.79%)
kafka-connect-datagenA Kafka Connect source connector that generates data for tests
Stars: ✭ 27 (+42.11%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (+26.32%)
kozaData transformation framework for LinkML data models
Stars: ✭ 21 (+10.53%)
etlflowEtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+100%)
dogETLA lib to transform data from jdbc,csv,json to ecah other.
Stars: ✭ 15 (-21.05%)
DQCS数据质量控制系统
Stars: ✭ 34 (+78.95%)
gallia-coreA schema-aware Scala library for data transformation
Stars: ✭ 44 (+131.58%)
csvpluscsvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+252.63%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-15.79%)
uptasticsearchAn Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (+147.37%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+657.89%)
flockFlock: A Low-Cost Streaming Query Engine on FaaS Platforms
Stars: ✭ 232 (+1121.05%)
scholiaWikidata-based scholarly profiles
Stars: ✭ 166 (+773.68%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-10.53%)
go-bqloaderbqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.
Stars: ✭ 16 (-15.79%)
carryPython ETL(Extract-Transform-Load) tool / Data migration tool
Stars: ✭ 115 (+505.26%)
cubetlCubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (+10.53%)
mlbgamedayMulti-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
Stars: ✭ 37 (+94.74%)
mikThe Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
Stars: ✭ 32 (+68.42%)
maxwell-sinkconsume maxwell generated message from kafka,export it to another mysql.
Stars: ✭ 16 (-15.79%)
es2postgresElasticSearch to PostgreSQL loader
Stars: ✭ 18 (-5.26%)
singer-runnerA CLI and library to run Singer Taps and Targets
Stars: ✭ 33 (+73.68%)
kitodo-presentationKitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
Stars: ✭ 33 (+73.68%)
cardano-pyPython3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)
Stars: ✭ 17 (-10.53%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+3121.05%)
dswarman open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 57 (+200%)
astroAstro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+315.79%)
OpenKettleWebUI一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+626.32%)
python mozetlETL jobs for Firefox Telemetry
Stars: ✭ 25 (+31.58%)
CVparserCVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (+47.37%)
dflibIn-memory Java DataFrame library
Stars: ✭ 50 (+163.16%)
django-calaccess-raw-dataA Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Stars: ✭ 61 (+221.05%)
conciliatorOpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.
Stars: ✭ 95 (+400%)
urnlibJava library for representing, parsing and encoding URNs as in RFC2141 and RFC8141
Stars: ✭ 24 (+26.32%)
mydataharbor🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (+47.37%)
brunnhildeSiegfried-based characterization tool for directories and disk images
Stars: ✭ 55 (+189.47%)
etlM-Lab ingestion pipeline
Stars: ✭ 15 (-21.05%)
TEAMThe Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.
Stars: ✭ 27 (+42.11%)
DataXServer为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer) 分布式运行(DataX on YARN) 功能
Stars: ✭ 130 (+584.21%)