All Projects → Open Semantic Etl → Similar Projects or Alternatives

2326 Open source projects that are alternatives of or similar to Open Semantic Etl

Open Semantic Search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Stars: ✭ 386 (+133.94%)
Datashare
Better analyze information, in all its forms
Stars: ✭ 254 (+53.94%)
open-semantic-desktop-search
Virtual Machine for Desktop Search with Open Semantic Search
Stars: ✭ 22 (-86.67%)
Open Semantic Search Apps
Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations and named entities) and data import (ETL like text extraction, OCR and crawling filesystems or websites)
Stars: ✭ 55 (-66.67%)
Mutual labels:  solr, named-entity-recognition, ocr
Transformalize
Configurable Extract, Transform, and Load
Stars: ✭ 125 (-24.24%)
Mutual labels:  etl, solr, elasticsearch
Open Paperless
Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)
Stars: ✭ 2,538 (+1438.18%)
Mutual labels:  documents, pdf, ocr
Abc
Power of appbase.io via CLI, with nifty imports from your favorite data sources
Stars: ✭ 375 (+127.27%)
Mutual labels:  etl, elasticsearch
Polar Bookshelf
Polar is a personal knowledge repository for PDF and web content supporting incremental reading and document annotation.
Stars: ✭ 4,411 (+2573.33%)
Mutual labels:  annotation, pdf
Query Translator
Query Translator is a search query translator with AST representation
Stars: ✭ 165 (+0%)
Mutual labels:  solr, elasticsearch
Cogstack Pipeline
Distributed, fault tolerant batch processing for Natural Language Applications and Search, using remote partitioning
Stars: ✭ 26 (-84.24%)
Mutual labels:  elasticsearch, ocr
Paperless
Scan, index, and archive all of your paper documents
Stars: ✭ 7,662 (+4543.64%)
Mutual labels:  documents, ocr
Bentools Etl
PHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Stars: ✭ 45 (-72.73%)
Mutual labels:  etl, extract
Code4java
Repository for my java projects.
Stars: ✭ 164 (-0.61%)
Mutual labels:  solr, elasticsearch
Solrtexttagger
A text tagger based on Lucene / Solr, using FST technology
Stars: ✭ 162 (-1.82%)
Mutual labels:  solr, named-entity-recognition
Camelot
Camelot: PDF Table Extraction for Humans
Stars: ✭ 3,150 (+1809.09%)
Mutual labels:  extract, pdf
Ocrmypdf
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Stars: ✭ 5,549 (+3263.03%)
Mutual labels:  pdf, ocr
Pdf
编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+7178.18%)
Mutual labels:  solr, elasticsearch
Springbootexamples
Spring Boot 学习教程
Stars: ✭ 794 (+381.21%)
Mutual labels:  solr, elasticsearch
Excalibur
A web interface to extract tabular data from PDFs
Stars: ✭ 916 (+455.15%)
Mutual labels:  extract, pdf
Pdfanno
Linguistic Annotation and Visualization Tool for PDF Documents
Stars: ✭ 156 (-5.45%)
Mutual labels:  annotation, pdf
Php Docker Boilerplate
🍲 PHP Docker Boilerplate for Symfony, Wordpress, Joomla or any other PHP Project (NGINX, Apache HTTPd, PHP-FPM, MySQL, Solr, Elasticsearch, Redis, FTP)
Stars: ✭ 503 (+204.85%)
Mutual labels:  solr, elasticsearch
Pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Stars: ✭ 1,969 (+1093.33%)
Mutual labels:  pdf, ocr
Papermerge
Open Source Document Management System for Digital Archives (Scanned Documents)
Stars: ✭ 1,177 (+613.33%)
Mutual labels:  pdf, ocr
Kotlin Reference Chinese
Kotlin 官方文档(参考部分)中文版
Stars: ✭ 85 (-48.48%)
Mutual labels:  documents, pdf
Springboot Templates
springboot和dubbo、netty的集成,redis mongodb的nosql模板, kafka rocketmq rabbit的MQ模板, solr solrcloud elasticsearch查询引擎
Stars: ✭ 100 (-39.39%)
Mutual labels:  solr, elasticsearch
Etl
LinkedPipes ETL is an RDF based, lightweight ETL tool
Stars: ✭ 88 (-46.67%)
Mutual labels:  rdf, etl
Pdflayouttextstripper
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
Stars: ✭ 1,369 (+729.7%)
Mutual labels:  extract, pdf
Recogito2
Semantic Annotation Without the Pointy Brackets
Stars: ✭ 110 (-33.33%)
Mutual labels:  elasticsearch, annotation
Pdfocr
Adds text to PDF files using the cuneiform OCR software
Stars: ✭ 287 (+73.94%)
Mutual labels:  pdf, ocr
Chatbot ner
chatbot_ner: Named Entity Recognition for chatbots.
Stars: ✭ 273 (+65.45%)
Docspell
Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
Stars: ✭ 303 (+83.64%)
Mutual labels:  pdf, ocr
redcoat
A lightweight web-based annotation tool for labelling entity recognition data.
Stars: ✭ 19 (-88.48%)
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+146.06%)
Mutual labels:  solr, elasticsearch
Janusgraph
JanusGraph: an open-source, distributed graph database
Stars: ✭ 4,277 (+2492.12%)
Mutual labels:  solr, elasticsearch
Monstache
a go daemon that syncs MongoDB to Elasticsearch in realtime
Stars: ✭ 736 (+346.06%)
Mutual labels:  etl, elasticsearch
Itext7 Dotnet
iText 7 for .NET is the .NET version of the iText 7 library, formerly known as iTextSharp, which it replaces. iText 7 represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText 7 can be a boon to nearly every workflow.
Stars: ✭ 698 (+323.03%)
Mutual labels:  documents, pdf
Itext7
iText 7 for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText 7 can be a boon to nearly every workflow.
Stars: ✭ 913 (+453.33%)
Mutual labels:  documents, pdf
Org Noter
Emacs document annotator, using Org-mode
Stars: ✭ 671 (+306.67%)
Mutual labels:  documents, pdf
Mybox
Easy tools of document, image, file, network, location, color, and media.
Stars: ✭ 45 (-72.73%)
Mutual labels:  pdf, ocr
Nagios Plugins
450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+506.06%)
Mutual labels:  solr, elasticsearch
Datafari
Open Source, Distributed, Big Data Enterprise Search Engine
Stars: ✭ 47 (-71.52%)
Mutual labels:  solr, elasticsearch
solr-ontology-tagger
Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri
Stars: ✭ 36 (-78.18%)
Mutual labels:  solr, rdf
Transporter
Sync data between persistence engines, like ETL only not stodgy
Stars: ✭ 1,175 (+612.12%)
Mutual labels:  etl, elasticsearch
Vectorsinsearch
Dice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Stars: ✭ 71 (-56.97%)
Mutual labels:  solr, elasticsearch
Sentinl
Kibana Alert & Report App for Elasticsearch
Stars: ✭ 1,233 (+647.27%)
Mutual labels:  elasticsearch, pdf
Scanbot Sdk Example Android
Document scanning SDK example apps for the Scanbot SDK for Android.
Stars: ✭ 67 (-59.39%)
Mutual labels:  pdf, ocr
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-41.21%)
Mutual labels:  solr, elasticsearch
Remarks
Extract highlights, scribbles, and annotations from PDFs marked with the reMarkable tablet. Export to Markdown, PDF, PNG, and SVG
Stars: ✭ 94 (-43.03%)
Mutual labels:  pdf, ocr
Spring Boot 2.x Examples
Spring Boot 2.x code examples
Stars: ✭ 104 (-36.97%)
Mutual labels:  solr, elasticsearch
Lambda Text Extractor
AWS Lambda functions to extract text from various binary formats.
Stars: ✭ 159 (-3.64%)
Mutual labels:  pdf, ocr
Srchx
A standalone lightweight full-text search engine built on top of blevesearch and Go with multiple storage (scorch, boltdb, leveldb, badger)
Stars: ✭ 118 (-28.48%)
Mutual labels:  solr, elasticsearch
Solrdf
An RDF plugin for Solr
Stars: ✭ 113 (-31.52%)
Mutual labels:  rdf, solr
Etherpad Lite
Etherpad: A modern really-real-time collaborative document editor.
Stars: ✭ 11,937 (+7134.55%)
Mutual labels:  documents, pdf
Etl.net
Mass processing data with a complete ETL for .net developers
Stars: ✭ 129 (-21.82%)
Mutual labels:  etl, extract
Ik Analyzer
支持Lucene5/6/7/8+版本, 长期维护。
Stars: ✭ 112 (-32.12%)
Mutual labels:  solr, elasticsearch
Ambar
🔍 Ambar: Document Search Engine
Stars: ✭ 1,829 (+1008.48%)
Mutual labels:  pdf, ocr
CVparser
CVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (-83.03%)
Mutual labels:  etl, extract
ingest-file
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
Stars: ✭ 40 (-75.76%)
Mutual labels:  ocr, documents
Elasticsearch Synonyms
Curated synonym files and Helpers for Elasticsearch Synonym Token Filter
Stars: ✭ 51 (-69.09%)
Mutual labels:  solr, elasticsearch
Pdfsam
PDFsam, a desktop application to extract pages, split, merge, mix and rotate PDF files
Stars: ✭ 1,829 (+1008.48%)
Mutual labels:  extract, pdf
1-60 of 2326 similar projects