All Projects → openrefine-docker → Similar Projects or Alternatives

225 Open source projects that are alternatives of or similar to openrefine-docker

The OpenRefine Python Client from Paul Makepeace provides a library for communicating with an OpenRefine server. This fork extends the command line interface (CLI) and is distributed as a convenient one-file-executable (Windows, Linux, Mac). It is also available via Docker Hub, PyPI and Binder.

Stars: ✭ 67 (+252.63%)

Mutual labels: etl, openrefine, code4lib

openrefine-batch

Shell script to run OpenRefine in batch mode (import, transform, export). It orchestrates OpenRefine (server) and a python client that communicates with the OpenRefine API.

Stars: ✭ 76 (+300%)

Mutual labels: etl, openrefine, code4lib

wrangle

A data transformation package for deep learning with Autonomio, Keras and TensorFlow.

Stars: ✭ 15 (-21.05%)

Mutual labels: etl

persistity

A persistence framework for game developers

Stars: ✭ 34 (+78.95%)

Mutual labels: etl

sql-to-redis

🔄 Simple tool for ETL. From SQL to Redis.

Stars: ✭ 18 (-5.26%)

Mutual labels: etl

cobrix

A COBOL parser and Mainframe/EBCDIC data source for Apache Spark

Stars: ✭ 109 (+473.68%)

Mutual labels: etl

Library-Search-Plugin-Public

The Library Search Plugin plugin allows users (students, researchers, etc.) to search your library's catalogue, Google Scholar, WorldCat, or PubMed, without having to navigate to the respective websites first! It also comes with a neat context menu that allows users to select text, right-click, and search!

Stars: ✭ 17 (-10.53%)

Mutual labels: code4lib

architect big data solutions with spark

code, labs and lectures for the course

Stars: ✭ 40 (+110.53%)

Mutual labels: etl

oesophagus

Enterprise Grade Single-Step Streaming Data Infrastructure Setup. (Under Development)

Stars: ✭ 12 (-36.84%)

Mutual labels: etl

covid-19

Data ETL & Analysis on the global and Mexican datasets of the COVID-19 pandemic.

Stars: ✭ 14 (-26.32%)

Mutual labels: etl

oic-options-chains

ETL for OIC Options Chains

Stars: ✭ 22 (+15.79%)

Mutual labels: etl

metis-framework

Metis, named after the Titaness of Wisdom, is our in-development data publication framework including both a client application and a number of data processing (micro)services

Stars: ✭ 15 (-21.05%)

Mutual labels: code4lib

DataBridge.NET

Configurable data bridge for permanent ETL jobs

Stars: ✭ 16 (-15.79%)

Mutual labels: etl

kafka-connect-datagen

A Kafka Connect source connector that generates data for tests

Stars: ✭ 27 (+42.11%)

Mutual labels: etl

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (+26.32%)

Mutual labels: etl

koza

Data transformation framework for LinkML data models

Stars: ✭ 21 (+10.53%)

Mutual labels: etl

etlflow

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for writing various different tasks, jobs on GCP and AWS.

Stars: ✭ 38 (+100%)

Mutual labels: etl

redis-connect-dist

Real-Time Event Streaming & Change Data Capture

Stars: ✭ 21 (+10.53%)

Mutual labels: etl

dogETL

A lib to transform data from jdbc,csv,json to ecah other.

Stars: ✭ 15 (-21.05%)

Mutual labels: etl

spdr-etf-holdings

ETL for the SPDR ETF holdings XLS documents

Stars: ✭ 14 (-26.32%)

Mutual labels: etl

DQCS

数据质量控制系统

Stars: ✭ 34 (+78.95%)

Mutual labels: etl

gallia-core

A schema-aware Scala library for data transformation

Stars: ✭ 44 (+131.58%)

Mutual labels: etl

csvplus

csvplus extends the standard Go encoding/csv package with fluent interface, lazy stream operations, indices and joins.

Stars: ✭ 67 (+252.63%)

Mutual labels: etl

lineage

Generate beautiful documentation for your data pipelines in markdown format

Stars: ✭ 16 (-15.79%)

Mutual labels: etl

uptasticsearch

An Elasticsearch client tailored to data science workflows.

Stars: ✭ 47 (+147.37%)

Mutual labels: etl

versatile-data-kit

Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.

Stars: ✭ 144 (+657.89%)

Mutual labels: etl

flock

Flock: A Low-Cost Streaming Query Engine on FaaS Platforms

Stars: ✭ 232 (+1121.05%)

Mutual labels: etl

scholia

Wikidata-based scholarly profiles

Stars: ✭ 166 (+773.68%)

Mutual labels: code4lib

sparklanes

A lightweight data processing framework for Apache Spark

Stars: ✭ 17 (-10.53%)

Mutual labels: etl

go-bqloader

bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.

Stars: ✭ 16 (-15.79%)

Mutual labels: etl

carry

Python ETL(Extract-Transform-Load) tool / Data migration tool

Stars: ✭ 115 (+505.26%)

Mutual labels: etl

cubetl

CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)

Stars: ✭ 21 (+10.53%)

Mutual labels: etl

rivery cli

Rivery CLI

Stars: ✭ 16 (-15.79%)

Mutual labels: etl

OpenRefine-ecology-lesson

Data Cleaning with OpenRefine for Ecologists

Stars: ✭ 20 (+5.26%)

Mutual labels: openrefine

mlbgameday

Multi-core processing of 'Gameday' data from Major League Baseball Advanced Media. Additional tools to parallelize large data sets and write them to a database.

Stars: ✭ 37 (+94.74%)

Mutual labels: etl

mik

The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).

Stars: ✭ 32 (+68.42%)

Mutual labels: etl

maxwell-sink

consume maxwell generated message from kafka,export it to another mysql.

Stars: ✭ 16 (-15.79%)

Mutual labels: etl

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-10.53%)

Mutual labels: etl

es2postgres

ElasticSearch to PostgreSQL loader

Stars: ✭ 18 (-5.26%)

Mutual labels: etl

singer-runner

A CLI and library to run Singer Taps and Targets

Stars: ✭ 33 (+73.68%)

Mutual labels: etl

ruby-for-pentaho-kettle

Ruby scripting for pentaho-kettle

Stars: ✭ 42 (+121.05%)

Mutual labels: etl

kitodo-presentation

Kitodo.Presentation is a feature-rich framework for building a METS- or IIIF-based digital library. It is part of the Kitodo Digital Library Suite.

Stars: ✭ 33 (+73.68%)

Mutual labels: code4lib

cardano-py

Python3 lib and cli for operating a Cardano Passive Node and using the API's. (PRE-ALPHA)

Stars: ✭ 17 (-10.53%)

Mutual labels: etl

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+3121.05%)

Mutual labels: etl

dswarm

an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)

Stars: ✭ 57 (+200%)

Mutual labels: etl

nasdaq-symbols

ETL for the NASDAQ symbol file

Stars: ✭ 13 (-31.58%)

Mutual labels: etl

astro

Astro allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.

Stars: ✭ 79 (+315.79%)

Mutual labels: etl

OpenKettleWebUI

一款基于kettle的数据处理web调度控制平台，支持文档资源库和数据库资源库，通过web平台控制kettle数据转换，可作为中间件集成到现有系统中

Stars: ✭ 138 (+626.32%)

Mutual labels: etl

web-click-flow

网站点击流离线日志分析

Stars: ✭ 14 (-26.32%)

Mutual labels: etl

python mozetl

ETL jobs for Firefox Telemetry

Stars: ✭ 25 (+31.58%)

Mutual labels: etl

CVparser

CVparser is software for parsing or extracting data out of CV/resumes.

Stars: ✭ 28 (+47.37%)

Mutual labels: etl

dflib

In-memory Java DataFrame library

Stars: ✭ 50 (+163.16%)

Mutual labels: etl

django-calaccess-raw-data

A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database

Stars: ✭ 61 (+221.05%)

Mutual labels: etl

conciliator

OpenRefine reconciliation services for VIAF, ORCID, and Open Library + framework for creating more.

Stars: ✭ 95 (+400%)

Mutual labels: openrefine

urnlib

Java library for representing, parsing and encoding URNs as in RFC2141 and RFC8141

Stars: ✭ 24 (+26.32%)

Mutual labels: code4lib

mydataharbor

🇨🇳 MyDataHarbor是一个致力于解决任意数据源到任意数据源的分布式、高扩展性、高性能、事务级的数据同步中间件。帮助用户可靠、快速、稳定的对海量数据进行准实时增量同步或者定时全量同步，主要定位是为实时交易系统服务，亦可用于大数据的数据同步（ETL领域）。

Stars: ✭ 28 (+47.37%)

Mutual labels: etl

brunnhilde

Siegfried-based characterization tool for directories and disk images

Stars: ✭ 55 (+189.47%)

Mutual labels: code4lib

etl

M-Lab ingestion pipeline

Stars: ✭ 15 (-21.05%)

Mutual labels: etl

TEAM

The Taxonomy for ETL Automation Metadata (TEAM) is a metadata management tool for data warehouse automation. It is part of the ecosystem for data warehouse automation, alongside the Virtual Data Warehouse pattern manager and the generic schema for Data Warehouse Automation.

Stars: ✭ 27 (+42.11%)

Mutual labels: etl

DataXServer

为DataX(https://github.com/alibaba/DataX) 提供远程多语言调用（ThriftServer，HttpServer）分布式运行（DataX on YARN）功能

Stars: ✭ 130 (+584.21%)

Mutual labels: etl

1-60 of 225 similar projects

›