All Projects → ICIJ → Extract

ICIJ / Extract

Licence: mit
A cross-platform command line tool for parallelised content extraction and analysis.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Extract

Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (-12.23%)
Mutual labels:  etl, solr
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (-33.51%)
Mutual labels:  etl, solr
Etl unicorn
数据可视化, 数据挖掘, 数据处理 ETL
Stars: ✭ 156 (-17.02%)
Mutual labels:  etl
Aws Serverless Data Lake Framework
Enterprise-grade, production-hardened, serverless data lake on AWS
Stars: ✭ 179 (-4.79%)
Mutual labels:  etl
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+2516.49%)
Mutual labels:  etl
Xianglong
资产配置方案
Stars: ✭ 158 (-15.96%)
Mutual labels:  index
Linq2db
Linq to database provider.
Stars: ✭ 2,211 (+1076.06%)
Mutual labels:  etl
Metl
mito ETL tool
Stars: ✭ 153 (-18.62%)
Mutual labels:  etl
Mongo Es
A MongoDB to Elasticsearch connector
Stars: ✭ 185 (-1.6%)
Mutual labels:  etl
Grafter
Linked Data & RDF Manufacturing Tools in Clojure
Stars: ✭ 174 (-7.45%)
Mutual labels:  etl
Usaspending Api
Server application to serve U.S. federal spending data via a RESTful API
Stars: ✭ 166 (-11.7%)
Mutual labels:  etl
Solrtexttagger
A text tagger based on Lucene / Solr, using FST technology
Stars: ✭ 162 (-13.83%)
Mutual labels:  solr
Bender
Bender - Serverless ETL Framework
Stars: ✭ 171 (-9.04%)
Mutual labels:  etl
Tis Solr
an enterprise search engine base on Apache Solr
Stars: ✭ 158 (-15.96%)
Mutual labels:  solr
Xbin Store
模仿国内知名B2C网站,实现的一个分布式B2C商城 使用Spring Boot 自动配置 Dubbox / MVC / MyBatis / Druid / Solr / Redis 等。使用Spring Cloud版本请查看
Stars: ✭ 2,140 (+1038.3%)
Mutual labels:  solr
Mara Example Project 2
An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (-18.09%)
Mutual labels:  etl
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (-12.23%)
Mutual labels:  solr
Unnpk
解包网易游戏NeoX引擎NPK文件,如阴阳师、魔法禁书目录。
Stars: ✭ 171 (-9.04%)
Mutual labels:  index
Pilosa
Pilosa is an open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.
Stars: ✭ 2,224 (+1082.98%)
Mutual labels:  index
Metl
Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file based Extract/Transform/Load (ETL), and remote procedure invocation via Web Services. Read more at www.jumpmind.com/products/metl/overview
Stars: ✭ 185 (-1.6%)
Mutual labels:  etl

Extract

Circle CI

A cross-platform command line tool for parallelized, distributed content-extraction. Built on top of Apache Tika and an essential part of the engineering behind the Panama Papers, Swiss Leaks and Luxembourg Leaks investigations.

It supports Redis-backed queueing for distributed, parallel extraction and will write to Solr, plain text files or standard output.

For guidance and instructions, please see the wiki.

Credits and Collaboration

Initialy developed by Matthew Caruana Galizia at ICIJ.

We welcome contributions! Please submit pull requests or contact us directly.

License

Copyright (c) 2018 International Consortium of Investigative Journalists. See LICENSE.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].