All Projects → stweiss → FileBasedMiniDMS

stweiss / FileBasedMiniDMS

Licence: MIT license
This php script sorts your documents (by using hardlinks) into subfolders based on the hashtags it finds in your documents filenames.

Programming Languages

PHP
23972 projects - #3 most used programming language

Projects that are alternatives of or similar to FileBasedMiniDMS

Papermerge
Open Source Document Management System for Digital Archives (Scanned Documents)
Stars: ✭ 1,177 (+3262.86%)
Mutual labels:  ocr, scan, document-management
Docspell
Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
Stars: ✭ 303 (+765.71%)
Mutual labels:  ocr, document-management
Mayan Edms
Repository mirror of GtLab: https://gitlab.com/mayan-edms/mayan-edms Please use the upstream repository for issues and pull requests.
Stars: ✭ 398 (+1037.14%)
Mutual labels:  ocr, document-management
redpill-tool-chain
这是一个测试项目,可能会有不可预测的事情发生(比如:毁损数据、烧毁硬件等等),请谨慎使用。
Stars: ✭ 490 (+1300%)
Mutual labels:  synology, dsm
homebridge-synology
Control your Synology Diskstation with Homekit
Stars: ✭ 135 (+285.71%)
Mutual labels:  synology, diskstation
cloudflareddns
DDNS with Cloudflare
Stars: ✭ 33 (-5.71%)
Mutual labels:  synology, dsm
Lodestone
Personal Document Archiving (DMS, EDMS for Personal/Home Office use)
Stars: ✭ 426 (+1117.14%)
Mutual labels:  ocr, document-management
i-librarian-free
I, Librarian - open-source version of a PDF managing SaaS.
Stars: ✭ 110 (+214.29%)
Mutual labels:  ocr, document-management
Mayan Edms
Free Open Source Document Management System (mirror, no pull request or issues)
Stars: ✭ 226 (+545.71%)
Mutual labels:  ocr, document-management
Paperwork
Personal document manager (Linux/Windows) -- Moved to Gnome's Gitlab
Stars: ✭ 2,392 (+6734.29%)
Mutual labels:  ocr, document-management
paperbase
Open source document organizer with automatic OCR and full text search
Stars: ✭ 21 (-40%)
Mutual labels:  ocr, scan
phpvirtualbox4dsm
PhpVirtualbox package for Synology DSM
Stars: ✭ 28 (-20%)
Mutual labels:  synology, dsm
qiqqa-open-source
The open-sourced version of the award-winning Qiqqa research management tool for Windows (a bleeding edge dev fork) ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ・ ☞☞☞ File any issues you find in the main repo issue tracker at https://github.com/jimmejardine/qiqqa-open-source/issues
Stars: ✭ 32 (-8.57%)
Mutual labels:  document-management
blinkid-ui-android
Customizable UI library that includes camera management, scanning screen, and document selection module.
Stars: ✭ 33 (-5.71%)
Mutual labels:  ocr
Document-Scanner-and-OCR
A simple document scanner with OCR implemented using Python and OpenCV
Stars: ✭ 31 (-11.43%)
Mutual labels:  ocr
OCR-Test
An experiment about OCR in Android
Stars: ✭ 47 (+34.29%)
Mutual labels:  ocr
textocry
Textocry - Copy text from Images (chrome extension)
Stars: ✭ 29 (-17.14%)
Mutual labels:  ocr
Hyper-Table-OCR
A carefully-designed OCR pipeline for universal boarded table recognition and reconstruction.
Stars: ✭ 96 (+174.29%)
Mutual labels:  ocr
LoL-TFT-Champion-Masking
League Of Legends - Teamfight Tactics Champion Masking
Stars: ✭ 23 (-34.29%)
Mutual labels:  ocr
Nkocr
🔎📝 This is a module to make specifics OCRs at food products and nutritional tables.
Stars: ✭ 15 (-57.14%)
Mutual labels:  ocr

FileBasedMiniDMS

FileBasedMiniDMS.php by Stefan Weiss (2017-2021) https://github.com/stweiss/FileBasedMiniDMS

CHANGELOG

Version 0.17 (10.02.2021)

  • renamed default config.php to config.php.template (in git), so you can pull without overwriting your local config.php (issue #13)
  • introduced option to set the detected date as file date (variable $setfiletime, enabled by default) (issue #5)
  • skip future dates during date detection
  • minor bugfixes

Version 0.16 (29.10.2019)

  • Add date format 01. Januar, 2019 / 01 Januar 2019 (thanks @SirUli) (pull #12)
  • internal clean up

Version 0.15 (27.09.2019)

  • now compatible with ocrmypdf v9.0.0

Version 0.14 (12.04.2019)

  • don't ORC files, which already have been ocr'ed. Should have been happening only in special rare cases. (issue #9)
  • change to long php opening tags for better php compatibility (issue #6)

Version 0.13 (22.10.2018)

  • improved detection of dates (thanks vanto) (pull #7)

Version 0.12b (12.06.2017)

  • New: $dateseperator can be modified in config.php
  • Change: Default date for rename is now creation date of the pdf. (was "now" before)

Version 0.11 (08.06.2017)

  • New: automatic OCR and automatic rename

Version 0.02 (02.03.2016)

  • release of this file based document management system.
  • sorts files with hashtags into hashtag-folders.

INSTALL

  1. Place this file on your FileServer/NAS
  2. For OCR (Step 1): Install Docker and pull an ocrmypdf image, eg. docker pull jbarlow83/ocrmypdf
  3. For Automatic rename (Step 1.1): make sure that pdftotext is available.
  4. Copy config.php.template to config.php.
  5. Adjust settings for this script in config.php to fit your needs.
  6. Create a cronjob on your FileServer/NAS to execute this script regularly. (In DSM you can do this in Control Panel -> Task Scheduler) It might be required to assign root privilege.
    ex. php /volume1/home/stefan/Scans/FileBasedMiniDMS.php
    or redirect stdout to see PHP Warnings/Errors:
    php /volume1/home/stefan/Scans/FileBasedMiniDMS.php >> /volume1/home/stefan/Scans/my.log 2>&1

NOTES

This script works in three steps. Each step can be turned on/off in config.php:

Step 1: OCR

OCR pdf files in the $inboxfolder, whose filename matches $matchWithoutOCR

Step 1.1: Rename ocr'ed files based on keywords and date

The pdf is going to be renamed to following structure: "<date> <name> <tags>.pdf"

<date>: The script tries to find a date in the pdf. If none is found the current date is used.
<name>: You can define $renamerules. The first rule which matches the ocr'ed content of the first page is used. You can use the operators & (AND) and , (OR) and you can use the wildcard operators ? and *.
<tags>: In $tagrules you can specify your tags. All matching rules will add their tag to the filename. You can use the same operators here.

Step 2: Tagging

This script creates a subfolder for each hashtag it finds in your filenames and creates a hardlink in that folder. Documents are expected to be stored flat in one folder. Name-structure needs to be like "<any name> #hashtag1 #hashtag2.extension".

eg: "Documents/Scans/2015-12-25 Bill of Santa Clause #bills #2015.pdf" will be linked into:

  • "Documents/Scans/tags/2015/2015-12-25 Bill of Santa Clause #bills.pdf"
  • "Documents/Scans/tags/bills/2015-12-25 Bill of Santa Clause #2015.pdf"

FAQ

Q: How do I assign another tag to my file?
A: Simply rename the file in the $scanfolder and add the tag at the end (but before the extension).

Q: How can I fix a typo in a documents filename?
A: Simply rename the file in the $scanfolder. The tags are created from scratch at the next scheduled interval and the old links and tags are automatically getting removed.

Disclaimer

Make sure to have a backup before you start using this script. You use this software on your own risk.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].