All Projects → Open Semantic Etl → Similar Projects or Alternatives

2326 Open source projects that are alternatives of or similar to Open Semantic Etl

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Stars: ✭ 386 (+133.94%)

Mutual labels: named-entity-recognition, annotation, ocr

Datashare

Better analyze information, in all its forms

Stars: ✭ 254 (+53.94%)

Mutual labels: extract, elasticsearch, named-entity-recognition

open-semantic-desktop-search

Virtual Machine for Desktop Search with Open Semantic Search

Stars: ✭ 22 (-86.67%)

Mutual labels: annotation, etl, named-entity-recognition

Open Semantic Search Apps

Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations and named entities) and data import (ETL like text extraction, OCR and crawling filesystems or websites)

Stars: ✭ 55 (-66.67%)

Mutual labels: solr, named-entity-recognition, ocr

Transformalize

Configurable Extract, Transform, and Load

Stars: ✭ 125 (-24.24%)

Mutual labels: etl, solr, elasticsearch

Open Paperless

Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)

Stars: ✭ 2,538 (+1438.18%)

Mutual labels: documents, pdf, ocr

Abc

Power of appbase.io via CLI, with nifty imports from your favorite data sources

Stars: ✭ 375 (+127.27%)

Mutual labels: etl, elasticsearch

Polar Bookshelf

Polar is a personal knowledge repository for PDF and web content supporting incremental reading and document annotation.

Stars: ✭ 4,411 (+2573.33%)

Mutual labels: annotation, pdf

Query Translator

Query Translator is a search query translator with AST representation

Stars: ✭ 165 (+0%)

Mutual labels: solr, elasticsearch

Cogstack Pipeline

Distributed, fault tolerant batch processing for Natural Language Applications and Search, using remote partitioning

Stars: ✭ 26 (-84.24%)

Mutual labels: elasticsearch, ocr

Paperless

Scan, index, and archive all of your paper documents

Stars: ✭ 7,662 (+4543.64%)

Mutual labels: documents, ocr

Bentools Etl

PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.

Stars: ✭ 45 (-72.73%)

Mutual labels: etl, extract

Code4java

Repository for my java projects.

Stars: ✭ 164 (-0.61%)

Mutual labels: solr, elasticsearch

Solrtexttagger

A text tagger based on Lucene / Solr, using FST technology

Stars: ✭ 162 (-1.82%)

Mutual labels: solr, named-entity-recognition

Camelot

Camelot: PDF Table Extraction for Humans

Stars: ✭ 3,150 (+1809.09%)

Mutual labels: extract, pdf

Ocrmypdf

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched

Stars: ✭ 5,549 (+3263.03%)

Mutual labels: pdf, ocr

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+7178.18%)

Mutual labels: solr, elasticsearch

Springbootexamples

Spring Boot 学习教程

Stars: ✭ 794 (+381.21%)

Mutual labels: solr, elasticsearch

Excalibur

A web interface to extract tabular data from PDFs

Stars: ✭ 916 (+455.15%)

Mutual labels: extract, pdf

Pdfanno

Linguistic Annotation and Visualization Tool for PDF Documents

Stars: ✭ 156 (-5.45%)

Mutual labels: annotation, pdf

Php Docker Boilerplate

🍲 PHP Docker Boilerplate for Symfony, Wordpress, Joomla or any other PHP Project (NGINX, Apache HTTPd, PHP-FPM, MySQL, Solr, Elasticsearch, Redis, FTP)

Stars: ✭ 503 (+204.85%)

Mutual labels: solr, elasticsearch

Pdftabextract

A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

Stars: ✭ 1,969 (+1093.33%)

Mutual labels: pdf, ocr

Papermerge

Open Source Document Management System for Digital Archives (Scanned Documents)

Stars: ✭ 1,177 (+613.33%)

Mutual labels: pdf, ocr

Kotlin Reference Chinese

Kotlin 官方文档（参考部分）中文版

Stars: ✭ 85 (-48.48%)

Mutual labels: documents, pdf

Springboot Templates

springboot和dubbo、netty的集成，redis mongodb的nosql模板， kafka rocketmq rabbit的MQ模板， solr solrcloud elasticsearch查询引擎

Stars: ✭ 100 (-39.39%)

Mutual labels: solr, elasticsearch

Etl

LinkedPipes ETL is an RDF based, lightweight ETL tool

Stars: ✭ 88 (-46.67%)

Mutual labels: rdf, etl

Pdflayouttextstripper

Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).

Stars: ✭ 1,369 (+729.7%)

Mutual labels: extract, pdf

Recogito2

Semantic Annotation Without the Pointy Brackets

Stars: ✭ 110 (-33.33%)

Mutual labels: elasticsearch, annotation

Pdfocr

Adds text to PDF files using the cuneiform OCR software

Stars: ✭ 287 (+73.94%)

Mutual labels: pdf, ocr

Chatbot ner

chatbot_ner: Named Entity Recognition for chatbots.

Stars: ✭ 273 (+65.45%)

Mutual labels: elasticsearch, named-entity-recognition

Docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.

Stars: ✭ 303 (+83.64%)

Mutual labels: pdf, ocr

redcoat

A lightweight web-based annotation tool for labelling entity recognition data.

Stars: ✭ 19 (-88.48%)

Mutual labels: annotation, named-entity-recognition

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+146.06%)

Mutual labels: solr, elasticsearch

Janusgraph

JanusGraph: an open-source, distributed graph database

Stars: ✭ 4,277 (+2492.12%)

Mutual labels: solr, elasticsearch

Monstache

a go daemon that syncs MongoDB to Elasticsearch in realtime

Stars: ✭ 736 (+346.06%)

Mutual labels: etl, elasticsearch

Itext7 Dotnet

iText 7 for .NET is the .NET version of the iText 7 library, formerly known as iTextSharp, which it replaces. iText 7 represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText 7 can be a boon to nearly every workflow.

Stars: ✭ 698 (+323.03%)

Mutual labels: documents, pdf

Itext7

iText 7 for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText 7 can be a boon to nearly every workflow.

Stars: ✭ 913 (+453.33%)

Mutual labels: documents, pdf

Org Noter

Emacs document annotator, using Org-mode

Stars: ✭ 671 (+306.67%)

Mutual labels: documents, pdf

Mybox

Easy tools of document, image, file, network, location, color, and media.

Stars: ✭ 45 (-72.73%)

Mutual labels: pdf, ocr

Nagios Plugins

450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...

Stars: ✭ 1,000 (+506.06%)

Mutual labels: solr, elasticsearch

Datafari

Open Source, Distributed, Big Data Enterprise Search Engine

Stars: ✭ 47 (-71.52%)

Mutual labels: solr, elasticsearch

solr-ontology-tagger

Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri

Stars: ✭ 36 (-78.18%)

Mutual labels: solr, rdf

Transporter

Sync data between persistence engines, like ETL only not stodgy

Stars: ✭ 1,175 (+612.12%)

Mutual labels: etl, elasticsearch

Vectorsinsearch

Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015

Stars: ✭ 71 (-56.97%)

Mutual labels: solr, elasticsearch

Sentinl

Kibana Alert & Report App for Elasticsearch

Stars: ✭ 1,233 (+647.27%)

Mutual labels: elasticsearch, pdf

Scanbot Sdk Example Android

Document scanning SDK example apps for the Scanbot SDK for Android.

Stars: ✭ 67 (-59.39%)

Mutual labels: pdf, ocr

Logisland

Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.

Stars: ✭ 97 (-41.21%)

Mutual labels: solr, elasticsearch

Remarks

Extract highlights, scribbles, and annotations from PDFs marked with the reMarkable tablet. Export to Markdown, PDF, PNG, and SVG

Stars: ✭ 94 (-43.03%)

Mutual labels: pdf, ocr

Spring Boot 2.x Examples

Spring Boot 2.x code examples

Stars: ✭ 104 (-36.97%)

Mutual labels: solr, elasticsearch

Lambda Text Extractor

AWS Lambda functions to extract text from various binary formats.