ICIJ / Extract
Licence: mit
A cross-platform command line tool for parallelised content extraction and analysis.
Stars: ✭ 188
Programming Languages
java
68154 projects - #9 most used programming language
Projects that are alternatives of or similar to Extract
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (-12.23%)
Mutual labels: etl, solr
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (-33.51%)
Mutual labels: etl, solr
Aws Serverless Data Lake Framework
Enterprise-grade, production-hardened, serverless data lake on AWS
Stars: ✭ 179 (-4.79%)
Mutual labels: etl
Airbyte
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Stars: ✭ 4,919 (+2516.49%)
Mutual labels: etl
Usaspending Api
Server application to serve U.S. federal spending data via a RESTful API
Stars: ✭ 166 (-11.7%)
Mutual labels: etl
Solrtexttagger
A text tagger based on Lucene / Solr, using FST technology
Stars: ✭ 162 (-13.83%)
Mutual labels: solr
Xbin Store
模仿国内知名B2C网站,实现的一个分布式B2C商城 使用Spring Boot 自动配置 Dubbox / MVC / MyBatis / Druid / Solr / Redis 等。使用Spring Cloud版本请查看
Stars: ✭ 2,140 (+1038.3%)
Mutual labels: solr
Mara Example Project 2
An example mini data warehouse for python project stats, template for new projects
Stars: ✭ 154 (-18.09%)
Mutual labels: etl
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (-12.23%)
Mutual labels: solr
Pilosa
Pilosa is an open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.
Stars: ✭ 2,224 (+1082.98%)
Mutual labels: index
Metl
Metl is a simple, web-based integration platform that allows for several different styles of data integration including messaging, file based Extract/Transform/Load (ETL), and remote procedure invocation via Web Services. Read more at www.jumpmind.com/products/metl/overview
Stars: ✭ 185 (-1.6%)
Mutual labels: etl
Extract
A cross-platform command line tool for parallelized, distributed content-extraction. Built on top of Apache Tika and an essential part of the engineering behind the Panama Papers, Swiss Leaks and Luxembourg Leaks investigations.
It supports Redis-backed queueing for distributed, parallel extraction and will write to Solr, plain text files or standard output.
For guidance and instructions, please see the wiki.
Credits and Collaboration
Initialy developed by Matthew Caruana Galizia at ICIJ.
We welcome contributions! Please submit pull requests or contact us directly.
License
Copyright (c) 2018 International Consortium of Investigative Journalists. See LICENSE
.
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].