All Projects β†’ RD17 β†’ Ambar

RD17 / Ambar

Licence: mit
πŸ” Ambar: Document Search Engine

Programming Languages

javascript
184084 projects - #8 most used programming language
python
139335 projects - #7 most used programming language
SCSS
7915 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to Ambar

Opensearchserver
Open-source Enterprise Grade Search Engine Software
Stars: ✭ 408 (-77.69%)
Mutual labels:  search, search-engine, ocr
Open Semantic Search
Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Stars: ✭ 386 (-78.9%)
Mutual labels:  search, search-engine, ocr
Hypertag
Knowledge Management for Humans using Machine Learning & Tags
Stars: ✭ 116 (-93.66%)
Mutual labels:  search, search-engine, pdf
Flexsearch
Next-Generation full text search library for Browser and Node.js
Stars: ✭ 8,108 (+343.3%)
Mutual labels:  search, search-engine, search-in-text
Simpleaudioindexer
Searching for the occurrence seconds of words/phrases or arbitrary regex patterns within audio files
Stars: ✭ 100 (-94.53%)
Mutual labels:  search, search-engine
Remarks
Extract highlights, scribbles, and annotations from PDFs marked with the reMarkable tablet. Export to Markdown, PDF, PNG, and SVG
Stars: ✭ 94 (-94.86%)
Mutual labels:  pdf, ocr
Sonic
πŸ¦” Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.
Stars: ✭ 12,347 (+575.07%)
Mutual labels:  search, search-engine
Search Online
πŸ”A simple extension for VSCode to search online easily using search engine.
Stars: ✭ 115 (-93.71%)
Mutual labels:  search, search-engine
Papermerge
Open Source Document Management System for Digital Archives (Scanned Documents)
Stars: ✭ 1,177 (-35.65%)
Mutual labels:  pdf, ocr
Invoice As A Service
πŸ’° Simple invoicing service (REST API): from JSON to PDF
Stars: ✭ 106 (-94.2%)
Mutual labels:  self-hosted, pdf
Xinahn Client
δΈ€δΈͺεΌ€ζΊοΌŒι«˜ιšη§οΌŒθ‡ͺ枢θ‡ͺη”¨ηš„θšεˆζœη΄’εΌ•ζ“Žγ€‚https://xinahn.com
Stars: ✭ 116 (-93.66%)
Mutual labels:  self-hosted, search-engine
Algoliasearch Client Android
Algolia Search API Client for Android
Stars: ✭ 92 (-94.97%)
Mutual labels:  search, search-engine
Lieu
community search engine
Stars: ✭ 76 (-95.84%)
Mutual labels:  search, search-engine
Search
An Open Source Search Engine
Stars: ✭ 139 (-92.4%)
Mutual labels:  search, search-engine
Searx
Privacy-respecting metasearch engine
Stars: ✭ 10,074 (+450.79%)
Mutual labels:  search, search-engine
Torrentinim
A very low memory-footprint, self hosted API-only torrent search engine. Sonarr + Radarr Compatible, native support for Linux, Mac and Windows.
Stars: ✭ 123 (-93.28%)
Mutual labels:  self-hosted, search-engine
Whoogle Search
A self-hosted, ad-free, privacy-respecting metasearch engine
Stars: ✭ 4,645 (+153.96%)
Mutual labels:  search, search-engine
Downloadsearch
search for any kinds of files to download
Stars: ✭ 124 (-93.22%)
Mutual labels:  search, search-engine
Hrcloud2
A full-featured home hosted Cloud Drive, Personal Assistant, App Launcher, File Converter, Streamer, Share Tool & More!
Stars: ✭ 134 (-92.67%)
Mutual labels:  self-hosted, ocr
Automator
Various Automator and AppleScript workflow and scripts for simplifying life
Stars: ✭ 68 (-96.28%)
Mutual labels:  search, pdf

Version License

πŸ” Ambar: Document Search Engine

Ambar Search

⚠️ PROJECT ARCHIVED ⚠️

Ambar is an open-source document search engine with automated crawling, OCR, tagging and instant full-text search.

Ambar defines a new way to implement full-text document search into your workflow.

  • Easily deploy Ambar with a single docker-compose file
  • Perform Google-like search through your documents and contents of your images
  • Tag your documents
  • Use a simple REST API to integrate Ambar into your workflow

Features

Search

Tutorial: Mastering Ambar Search Queries

  • Fuzzy Search (John~3)
  • Phrase Search ("John Smith")
  • Search By Author (author:John)
  • Search By File Path (filename:*.txt)
  • Search By Date (when: yesterday, today, lastweek, etc)
  • Search By Size (size>1M)
  • Search By Tags (tags:ocr)
  • Search As You Type
  • Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Crawling

Ambar 2.0 only supports local fs crawling, if you need to crawl an SMB share of an FTP location - just mount it using standard linux tools. Crawling is automatic, no schedule is needed due to crawlers monitor file system events and automatically process new, changed and removed files.

Content Extraction

Ambar supports large files (>30MB)

Supported file types:

  • ZIP archives
  • Mail archives (PST)
  • MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
  • OCR over images
  • Email messages with attachments
  • Adobe PDF (with OCR)
  • OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
  • OpenOffice documents
  • RTF, Plaintext
  • HTML / XHTML
  • Multithread processing

Installation

Notice: Ambar requires Docker to run

You can build Docker images by yourself

  • Tutorial on how to build images from scratch see below

Building the images yourself

All the images required to run Ambar can be built locally. In general, each image can be built by navigating into the directory of the component in question, performing the compilation steps required and building the image like that:

# From project root
$ cd FrontEnd
$ docker build . -t <image_name>

The resulting image can be referred to by the name specified, and run by the containerization tooling of your choice.

In order to use a local Dockerfile with docker-compose, simply change the image option to build, setting the value to the relative path of the directory containing the Dockerfile. Then run docker-compose build to build the relevant images. For example:

# docker-compose.yml from project root, referencing local dockerfiles
pipeline0:
  build: ./Pipeline/
image: chazu/ambar-pipeline
  localcrawler:
    image: ./LocalCrawler/

Note that some of the components require compilation or other build steps be performed on the host before the docker images can be built. For example, FrontEnd:

# Assuming a suitable version of node.js is installed (docker uses 8.10)
$ npm install
$ npm run compile

Then follow this instructions -> https://ambar.cloud/docs/installation

FAQ

Is it open-source?

Yes, it's fully open-source.

Is it free?

Yes, it is forever free and open-source.

Does it perform OCR?

Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr query

Which languages are supported for OCR?

Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld.

Does it support tagging?

Yes!

What about searching in PDF?

Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.

What is the maximum file size it can handle?

It's limited by amount of RAM on your machine, typically it's 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.

Sponsors

Change Log

Change Log

Privacy Policy

Privacy Policy

License

MIT License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].