All Projects → opensemanticsearch → Open Semantic Etl

opensemanticsearch / Open Semantic Etl

Licence: gpl-3.0
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Open Semantic Etl

Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (-24.24%)
Mutual labels:  etl, solr, elasticsearch
open-semantic-desktop-search
Virtual Machine for Desktop Search with Open Semantic Search
Stars: ✭ 22 (-86.67%)
Mutual labels:  annotation, etl, named-entity-recognition
Datashare
Better analyze information, in all its forms
Stars: ✭ 254 (+53.94%)
Mutual labels:  extract, elasticsearch, named-entity-recognition
Open Semantic Search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Stars: ✭ 386 (+133.94%)
Mutual labels:  named-entity-recognition, annotation, ocr
Open Paperless
Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)
Stars: ✭ 2,538 (+1438.18%)
Mutual labels:  documents, pdf, ocr
Open Semantic Search Apps
Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations and named entities) and data import (ETL like text extraction, OCR and crawling filesystems or websites)
Stars: ✭ 55 (-66.67%)
Mutual labels:  solr, named-entity-recognition, ocr
Pdfsam
PDFsam, a desktop application to extract pages, split, merge, mix and rotate PDF files
Stars: ✭ 1,829 (+1008.48%)
Mutual labels:  extract, pdf
Ik Analyzer
支持Lucene5/6/7/8+版本, 长期维护。
Stars: ✭ 112 (-32.12%)
Mutual labels:  solr, elasticsearch
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (+0%)
Mutual labels:  solr, elasticsearch
Etherpad Lite
Etherpad: A modern really-real-time collaborative document editor.
Stars: ✭ 11,937 (+7134.55%)
Mutual labels:  documents, pdf
Pdflayouttextstripper
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
Stars: ✭ 1,369 (+729.7%)
Mutual labels:  extract, pdf
Srchx
A standalone lightweight full-text search engine built on top of blevesearch and Go with multiple storage (scorch, boltdb, leveldb, badger)
Stars: ✭ 118 (-28.48%)
Mutual labels:  solr, elasticsearch
Code4java
Repository for my java projects.
Stars: ✭ 164 (-0.61%)
Mutual labels:  solr, elasticsearch
Solrtexttagger
A text tagger based on Lucene / Solr, using FST technology
Stars: ✭ 162 (-1.82%)
Mutual labels:  solr, named-entity-recognition
Recogito2
Semantic Annotation Without the Pointy Brackets
Stars: ✭ 110 (-33.33%)
Mutual labels:  elasticsearch, annotation
Solrdf
An RDF plugin for Solr
Stars: ✭ 113 (-31.52%)
Mutual labels:  rdf, solr
Spring Boot 2.x Examples
Spring Boot 2.x code examples
Stars: ✭ 104 (-36.97%)
Mutual labels:  solr, elasticsearch
Ambar
🔍 Ambar: Document Search Engine
Stars: ✭ 1,829 (+1008.48%)
Mutual labels:  pdf, ocr
Lexpredict Contraxsuite
LexPredict ContraxSuite
Stars: ✭ 140 (-15.15%)
Mutual labels:  documents, ocr
Svglib
Read SVG files and convert them to other formats.
Stars: ✭ 139 (-15.76%)
Mutual labels:  documents, pdf

This project does not contain a readme.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].