Open Semantic SearchOpen Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Stars: ✭ 386 (+133.94%)
DatashareBetter analyze information, in all its forms
Stars: ✭ 254 (+53.94%)
Open Semantic Search AppsPython/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations and named entities) and data import (ETL like text extraction, OCR and crawling filesystems or websites)
Stars: ✭ 55 (-66.67%)
TransformalizeConfigurable Extract, Transform, and Load
Stars: ✭ 125 (-24.24%)
Open PaperlessScan, index, and archive all of your paper documents (acquired by Mayan EDMS)
Stars: ✭ 2,538 (+1438.18%)
AbcPower of appbase.io via CLI, with nifty imports from your favorite data sources
Stars: ✭ 375 (+127.27%)
Polar BookshelfPolar is a personal knowledge repository for PDF and web content supporting incremental reading and document annotation.
Stars: ✭ 4,411 (+2573.33%)
Query TranslatorQuery Translator is a search query translator with AST representation
Stars: ✭ 165 (+0%)
Cogstack PipelineDistributed, fault tolerant batch processing for Natural Language Applications and Search, using remote partitioning
Stars: ✭ 26 (-84.24%)
PaperlessScan, index, and archive all of your paper documents
Stars: ✭ 7,662 (+4543.64%)
Bentools EtlPHP ETL (Extract / Transform / Load) library with SOLID principles + almost no dependency.
Stars: ✭ 45 (-72.73%)
Code4javaRepository for my java projects.
Stars: ✭ 164 (-0.61%)
SolrtexttaggerA text tagger based on Lucene / Solr, using FST technology
Stars: ✭ 162 (-1.82%)
CamelotCamelot: PDF Table Extraction for Humans
Stars: ✭ 3,150 (+1809.09%)
OcrmypdfOCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Stars: ✭ 5,549 (+3263.03%)
Pdf编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
Stars: ✭ 12,009 (+7178.18%)
ExcaliburA web interface to extract tabular data from PDFs
Stars: ✭ 916 (+455.15%)
PdfannoLinguistic Annotation and Visualization Tool for PDF Documents
Stars: ✭ 156 (-5.45%)
Php Docker Boilerplate🍲 PHP Docker Boilerplate for Symfony, Wordpress, Joomla or any other PHP Project (NGINX, Apache HTTPd, PHP-FPM, MySQL, Solr, Elasticsearch, Redis, FTP)
Stars: ✭ 503 (+204.85%)
PdftabextractA set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Stars: ✭ 1,969 (+1093.33%)
PapermergeOpen Source Document Management System for Digital Archives (Scanned Documents)
Stars: ✭ 1,177 (+613.33%)
Springboot Templatesspringboot和dubbo、netty的集成,redis mongodb的nosql模板, kafka rocketmq rabbit的MQ模板, solr solrcloud elasticsearch查询引擎
Stars: ✭ 100 (-39.39%)
EtlLinkedPipes ETL is an RDF based, lightweight ETL tool
Stars: ✭ 88 (-46.67%)
PdflayouttextstripperConverts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
Stars: ✭ 1,369 (+729.7%)
Recogito2Semantic Annotation Without the Pointy Brackets
Stars: ✭ 110 (-33.33%)
PdfocrAdds text to PDF files using the cuneiform OCR software
Stars: ✭ 287 (+73.94%)
Chatbot nerchatbot_ner: Named Entity Recognition for chatbots.
Stars: ✭ 273 (+65.45%)
DocspellAssist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
Stars: ✭ 303 (+83.64%)
redcoatA lightweight web-based annotation tool for labelling entity recognition data.
Stars: ✭ 19 (-88.48%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+146.06%)
JanusgraphJanusGraph: an open-source, distributed graph database
Stars: ✭ 4,277 (+2492.12%)
Monstachea go daemon that syncs MongoDB to Elasticsearch in realtime
Stars: ✭ 736 (+346.06%)
Itext7 DotnetiText 7 for .NET is the .NET version of the iText 7 library, formerly known as iTextSharp, which it replaces. iText 7 represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText 7 can be a boon to nearly every workflow.
Stars: ✭ 698 (+323.03%)
Itext7iText 7 for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText 7 can be a boon to nearly every workflow.
Stars: ✭ 913 (+453.33%)
Org NoterEmacs document annotator, using Org-mode
Stars: ✭ 671 (+306.67%)
MyboxEasy tools of document, image, file, network, location, color, and media.
Stars: ✭ 45 (-72.73%)
Nagios Plugins450+ AWS, Hadoop, Cloud, Kafka, Docker, Elasticsearch, RabbitMQ, Redis, HBase, Solr, Cassandra, ZooKeeper, HDFS, Yarn, Hive, Presto, Drill, Impala, Consul, Spark, Jenkins, Travis CI, Git, MySQL, Linux, DNS, Whois, SSL Certs, Yum Security Updates, Kubernetes, Cloudera etc...
Stars: ✭ 1,000 (+506.06%)
DatafariOpen Source, Distributed, Big Data Enterprise Search Engine
Stars: ✭ 47 (-71.52%)
solr-ontology-taggerAutomatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri
Stars: ✭ 36 (-78.18%)
TransporterSync data between persistence engines, like ETL only not stodgy
Stars: ✭ 1,175 (+612.12%)
VectorsinsearchDice.com repo to accompany the dice.com 'Vectors in Search' talk by Simon Hughes, from the Activate 2018 search conference, and the 'Searching with Vectors' talk from Haystack 2019 (US). Builds upon my conceptual search and semantic search work from 2015
Stars: ✭ 71 (-56.97%)
SentinlKibana Alert & Report App for Elasticsearch
Stars: ✭ 1,233 (+647.27%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-41.21%)
RemarksExtract highlights, scribbles, and annotations from PDFs marked with the reMarkable tablet. Export to Markdown, PDF, PNG, and SVG
Stars: ✭ 94 (-43.03%)
Lambda Text ExtractorAWS Lambda functions to extract text from various binary formats.
Stars: ✭ 159 (-3.64%)
SrchxA standalone lightweight full-text search engine built on top of blevesearch and Go with multiple storage (scorch, boltdb, leveldb, badger)
Stars: ✭ 118 (-28.48%)
SolrdfAn RDF plugin for Solr
Stars: ✭ 113 (-31.52%)
Etherpad LiteEtherpad: A modern really-real-time collaborative document editor.
Stars: ✭ 11,937 (+7134.55%)
Etl.netMass processing data with a complete ETL for .net developers
Stars: ✭ 129 (-21.82%)
Ik Analyzer支持Lucene5/6/7/8+版本, 长期维护。
Stars: ✭ 112 (-32.12%)
Ambar🔍 Ambar: Document Search Engine
Stars: ✭ 1,829 (+1008.48%)
CVparserCVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (-83.03%)
ingest-fileIngestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
Stars: ✭ 40 (-75.76%)
Elasticsearch SynonymsCurated synonym files and Helpers for Elasticsearch Synonym Token Filter
Stars: ✭ 51 (-69.09%)
PdfsamPDFsam, a desktop application to extract pages, split, merge, mix and rotate PDF files
Stars: ✭ 1,829 (+1008.48%)