All Projects → papermerge → papermerge-core

papermerge / papermerge-core

Licence: Apache-2.0 license
Papermerge RESTful backend structured as reusable Django app

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to papermerge-core

paperless-ng
A supercharged version of paperless: scan, index and archive all your physical documents
Stars: ✭ 4,840 (+4599.03%)
Mutual labels:  ocr, dms, document-management-system
paperbase
Open source document organizer with automatic OCR and full text search
Stars: ✭ 21 (-79.61%)
Mutual labels:  ocr, documents, dms
Paperwork
Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab
Stars: ✭ 2,392 (+2222.33%)
Mutual labels:  ocr, dms
boxdetect
BoxDetect is a Python package based on OpenCV which allows you to easily detect rectangular shapes like character or checkbox boxes on scanned forms.
Stars: ✭ 46 (-55.34%)
Mutual labels:  documents, scanned-documents
ingest-file
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.
Stars: ✭ 40 (-61.17%)
Mutual labels:  ocr, documents
go-ocr
A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.
Stars: ✭ 31 (-69.9%)
Mutual labels:  ocr, scanned-documents
Paperless
Scan, index, and archive all of your paper documents
Stars: ✭ 7,662 (+7338.83%)
Mutual labels:  ocr, documents
Lexpredict Contraxsuite
LexPredict ContraxSuite
Stars: ✭ 140 (+35.92%)
Mutual labels:  ocr, documents
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (+60.19%)
Mutual labels:  ocr, documents
Open Paperless
Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)
Stars: ✭ 2,538 (+2364.08%)
Mutual labels:  ocr, documents
insomnia-plugin-documents-br
O plugin tem a finalidade de gerar documentos e alguns dados mais usados, o foco é para dados do Brasil.
Stars: ✭ 21 (-79.61%)
Mutual labels:  documents
deep-learning-for-document-dewarping
An application of high resolution GANs to dewarp images of perturbed documents
Stars: ✭ 100 (-2.91%)
Mutual labels:  ocr
HttpServerLite
TCP-based simple HTTP and HTTPS server, written in C#.
Stars: ✭ 44 (-57.28%)
Mutual labels:  restful-api
swagger-brake
Swagger contract checker for breaking API changes
Stars: ✭ 49 (-52.43%)
Mutual labels:  restful-api
CleanSCAN
A simple, smart and efficient document scanner for Android
Stars: ✭ 151 (+46.6%)
Mutual labels:  ocr
wine
A lightweight and flexible framework to help build elegant web API
Stars: ✭ 39 (-62.14%)
Mutual labels:  restful-api
awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
Stars: ✭ 620 (+501.94%)
Mutual labels:  ocr
ocreval
Update of the ISRI Analytic Tools for OCR Evaluation with UTF-8 support
Stars: ✭ 48 (-53.4%)
Mutual labels:  ocr
OCRVisualizer
Microsoft Cognitive Services, Computer Vision API, OCR Visualizer on documents
Stars: ✭ 19 (-81.55%)
Mutual labels:  ocr
gorest
Go RESTful API starter kit with Gin, JWT, GORM (MySQL, PostgreSQL, SQLite), Redis, Mongo, 2FA, email verification, password recovery
Stars: ✭ 135 (+31.07%)
Mutual labels:  restful-api

Tests

Papermerge REST API Server

This python package is the heart of Papermerge project. It consists of a set of reusable Django apps which are consumed across different bundles of Papermerge Document Management System (DMS).

Technically speaking, it contains following Django apps:

  • papermerge.core - the epicenter of Papermerge DMS project
  • papermerge.notifications - Django Channels app for sending notifications via websockets
  • papermerge.search - RESTful search. Supports four backends: Xapian, Whoosh, Elasticsearch, Solr.

What is Papermerge?

Papermerge is an open source document management system (DMS) primarily designed for archiving and retrieving your digital documents. Instead of having piles of paper documents all over your desk, office or drawers - you can quickly scan them and configure your scanner to directly upload to Papermerge DMS. Papermerge DMS on its turn will extract text data from the scanned documents using Optical Character Recognition (OCR) technology the index it and make it searchable. You will be able to quickly find any (scanned!) document using full text search capabilities.

Papermerge is perfect tool to manage documents in PDF, JPEG, TIFF and PNG formats.

Features Highlights

  • OpenAPI compliant REST API
  • Works well with PDF documents
  • OCR (Optical Character Recognition) of the documents (uses OCRmyPDF)
  • Full Text Search of the scanned documents (supports four search engine backends, uses Xapian by default)
  • Document Versions
  • Tags - assign colored tags to documents or folders
  • Documents and Folders - users can organize documents in folders
  • Multi-User (supports user groups)
  • User permissions management
  • Page Management - delete, reorder, cut & paste pages (uses PikePDF)

Documentation

For an overview on REST API is available here.

Detailed online REST API reference can be viewed as:

Note that REST API reference documentation is generated from OpenAPI schema. OpenAPI schema is stored in its own dedicated repository papermerge/openapi-schema.

Papermerge DMS documentation is available at https://docs.papermerge.io

Docker

In order to start Papermerge REST API server as docker image use following command:

docker run -p 8000:8000 \
    -e PAPERMERGE__MAIN__SECRET_KEY=abc \
    -e DJANGO_SUPERUSER_PASSWORD=123 \
    papermerge/papermerge:latest

If you want initial superuser to have another username (e.g. john), use DJANGO_SUPERUSER_USERNAME environment variable:

docker run -p 8000:8000 \
    -e PAPERMERGE__MAIN__SECRET_KEY=abc \
    -e DJANGO_SUPERUSER_PASSWORD=123 \
    -e DJANGO_SUPERUSER_USERNAME=john \
    papermerge/papermerge:latest

For full list of supported environment variables check online documentation.

Docker Compose

By default Papermerge REST API server uses sqlite3 database. In order to use PostgreSQL use following docker compose file:

version: '3.7'
services:
  app:
    image: papermerge/papermerge
    environment:
      - PAPERMERGE__MAIN__SECRET_KEY=abc
      - DJANGO_SUPERUSER_PASSWORD=12345
      - PAPERMERGE__DATABASE__TYPE=postgres
      - PAPERMERGE__DATABASE__USER=postgres
      - PAPERMERGE__DATABASE__PASSWORD=123
      - PAPERMERGE__DATABASE__NAME=postgres
      - PAPERMERGE__DATABASE__HOST=db
    ports:
      - 8000:8000
    depends_on:
      - db
  db:
    image: bitnami/postgresql:14.4.0
    volumes:
      - postgres_data:/var/lib/postgresql/data/
    environment:
      - POSTGRES_PASSWORD=123
volumes:
    postgres_data:version: '3.7'

Above mentioned docker compose file can be used to start Papermerge REST API server which will use PostgreSQL database to store data.

For detailed description on how to start Papermerge DMS using docker compose read Docker Compose/Detailed Explanation section in online docs.

Tests

Test suite is divided into two big groups:

  1. tests.core
  2. tests.search

First group is concerned with tests which do not depend on elasticsearch while second one tests.search is concerned with tests for which depend on elasticsearch and as result run very slow (hence the grouping). In order to run tests.core tests you need to have redis up and running; in order to run test.search you need to both redis and elasticsearch up and running.

Before running core tests suite, make sure redis service is up and running. Run tests:

 poetry run task test-core

Before running search tests suite, make sure both redis and elasticsearch services are up and running:

 poetry run task test-search

In order to run all tests suite (core + search):

poetry run task test

Linting

Use following command to make sure that your code is formatted per PEP8 spec:

poetry run task lint
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].