All Projects → openrefine-docker → Similar Projects or Alternatives

225 Open source projects that are alternatives of or similar to openrefine-docker

openrefine-client
The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.
Stars: ✭ 67 (+252.63%)
Mutual labels:  etl, openrefine, code4lib
openrefine-batch
Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.
Stars: ✭ 76 (+300%)
Mutual labels:  etl, openrefine, code4lib
wrangle
A data transformation package for deep learning with Autonomio, Keras and TensorFlow.
Stars: ✭ 15 (-21.05%)
Mutual labels:  etl
persistity
A persistence framework for game developers
Stars: ✭ 34 (+78.95%)
Mutual labels:  etl
sql-to-redis
🔄 Simple tool for ETL. From SQL to Redis.
Stars: ✭ 18 (-5.26%)
Mutual labels:  etl
cobrix
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Stars: ✭ 109 (+473.68%)
Mutual labels:  etl
Library-Search-Plugin-Public
The Library Search Plugin plugin allows users (students, researchers, etc.) to search your library's catalogue, Google Scholar, WorldCat, or PubMed, without having to navigate to the respective websites first! It also comes with a neat context menu that allows users to select text, right-click, and search!
Stars: ✭ 17 (-10.53%)
Mutual labels:  code4lib
architect big data solutions with spark
code, labs and lectures for the course
Stars: ✭ 40 (+110.53%)
Mutual labels:  etl
oesophagus
Enterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)
Stars: ✭ 12 (-36.84%)
Mutual labels:  etl
covid-19
Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.
Stars: ✭ 14 (-26.32%)
Mutual labels:  etl
oic-options-chains
ETL for OIC Options Chains
Stars: ✭ 22 (+15.79%)
Mutual labels:  etl
metis-framework
Metis, named after the Titaness of Wisdom, is our in-development data publication framework including both a client application and a number of data processing (micro)services
Stars: ✭ 15 (-21.05%)
Mutual labels:  code4lib
DataBridge.NET
Configurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-15.79%)
Mutual labels:  etl
kafka-connect-datagen
A Kafka Connect source connector that generates data for tests
Stars: ✭ 27 (+42.11%)
Mutual labels:  etl
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (+26.32%)
Mutual labels:  etl
koza
Data transformation framework for LinkML data models
Stars: ✭ 21 (+10.53%)
Mutual labels:  etl
etlflow
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.
Stars: ✭ 38 (+100%)
Mutual labels:  etl
redis-connect-dist
Real-Time Event Streaming & Change Data Capture
Stars: ✭ 21 (+10.53%)
Mutual labels:  etl
dogETL
A lib to transform data from jdbc,csv,json to ecah other.
Stars: ✭ 15 (-21.05%)
Mutual labels:  etl
spdr-etf-holdings
ETL for the SPDR ETF holdings XLS documents
Stars: ✭ 14 (-26.32%)
Mutual labels:  etl
DQCS
数据质量控制系统
Stars: ✭ 34 (+78.95%)
Mutual labels:  etl
gallia-core
A schema-aware Scala library for data transformation
Stars: ✭ 44 (+131.58%)
Mutual labels:  etl
csvplus
csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.
Stars: ✭ 67 (+252.63%)
Mutual labels:  etl
lineage
Generate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-15.79%)
Mutual labels:  etl
uptasticsearch
An Elasticsearch client tailored to data science workflows.
Stars: ✭ 47 (+147.37%)
Mutual labels:  etl
versatile-data-kit
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (+657.89%)
Mutual labels:  etl
flock
Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
Stars: ✭ 232 (+1121.05%)
Mutual labels:  etl
scholia
Wikidata-based scholarly profiles
Stars: ✭ 166 (+773.68%)
Mutual labels:  code4lib
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-10.53%)
Mutual labels:  etl
go-bqloader
bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.
Stars: ✭ 16 (-15.79%)
Mutual labels:  etl
carry
Python ETL(Extract-Transform-Load) tool / Data migration tool
Stars: ✭ 115 (+505.26%)
Mutual labels:  etl
cubetl
CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (+10.53%)
Mutual labels:  etl
rivery cli
Rivery CLI
Stars: ✭ 16 (-15.79%)
Mutual labels:  etl
OpenRefine-ecology-lesson
Data Cleaning with OpenRefine for Ecologists
Stars: ✭ 20 (+5.26%)
Mutual labels:  openrefine
mlbgameday
Multi-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.
Stars: ✭ 37 (+94.74%)
Mutual labels:  etl
mik
The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
Stars: ✭ 32 (+68.42%)
Mutual labels:  etl
maxwell-sink
consume maxwell generated message from kafka,export it to another mysql.
Stars: ✭ 16 (-15.79%)
Mutual labels:  etl
bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-10.53%)
Mutual labels:  etl
es2postgres
ElasticSearch to PostgreSQL loader
Stars: ✭ 18 (-5.26%)
Mutual labels:  etl
singer-runner
A CLI and library to run Singer Taps and Targets
Stars: ✭ 33 (+73.68%)
Mutual labels:  etl
ruby-for-pentaho-kettle
Ruby scripting for pentaho-kettle
Stars: ✭ 42 (+121.05%)
Mutual labels:  etl
kitodo-presentation
Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.
Stars: ✭ 33 (+73.68%)
Mutual labels:  code4lib
cardano-py
Python3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)
Stars: ✭ 17 (-10.53%)
Mutual labels:  etl
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+3121.05%)
Mutual labels:  etl
dswarm
an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)
Stars: ✭ 57 (+200%)
Mutual labels:  etl
nasdaq-symbols
ETL for the NASDAQ symbol file
Stars: ✭ 13 (-31.58%)
Mutual labels:  etl
astro
Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
Stars: ✭ 79 (+315.79%)
Mutual labels:  etl
OpenKettleWebUI
一款基于kettle的数据处理web调度控制平台,支持文档资源库和数据库资源库,通过web平台控制kettle数据转换,可作为中间件集成到现有系统中
Stars: ✭ 138 (+626.32%)
Mutual labels:  etl
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-26.32%)
Mutual labels:  etl
python mozetl
ETL jobs for Firefox Telemetry
Stars: ✭ 25 (+31.58%)
Mutual labels:  etl
CVparser
CVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (+47.37%)
Mutual labels:  etl
dflib
In-memory Java DataFrame library
Stars: ✭ 50 (+163.16%)
Mutual labels:  etl
django-calaccess-raw-data
A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Stars: ✭ 61 (+221.05%)
Mutual labels:  etl
conciliator
OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.
Stars: ✭ 95 (+400%)
Mutual labels:  openrefine
urnlib
Java library for representing, parsing and encoding URNs as in RFC2141 and RFC8141
Stars: ✭ 24 (+26.32%)
Mutual labels:  code4lib
mydataharbor
🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步,主要定位是为实时交易系统服务,亦可用于大数据的数据同步(ETL领域)。
Stars: ✭ 28 (+47.37%)
Mutual labels:  etl
brunnhilde
Siegfried-based characterization tool for directories and disk images
Stars: ✭ 55 (+189.47%)
Mutual labels:  code4lib
etl
M-Lab ingestion pipeline
Stars: ✭ 15 (-21.05%)
Mutual labels:  etl
TEAM
The Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.
Stars: ✭ 27 (+42.11%)
Mutual labels:  etl
DataXServer
为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用(ThriftServer,HttpServer) 分布式运行(DataX on YARN) 功能
Stars: ✭ 130 (+584.21%)
Mutual labels:  etl
1-60 of 225 similar projects