All Projects → alephdata → ingest-file

alephdata / ingest-file

Licence: MIT license
Ingestors extract the contents of mixed unstructured documents into structured (followthemoney) data.

Programming Languages

python
139335 projects - #7 most used programming language
Dockerfile
14818 projects
Makefile
30231 projects

Projects that are alternatives of or similar to ingest-file

dcfldd
Enhanced version of dd for forensics and security
Stars: ✭ 27 (-32.5%)
Mutual labels:  forensics, forensics-investigations
Lexpredict Contraxsuite
LexPredict ContraxSuite
Stars: ✭ 140 (+250%)
Mutual labels:  ocr, documents
ForensicsTools
A list of free and open forensics analysis tools and other resources
Stars: ✭ 392 (+880%)
Mutual labels:  forensics, forensics-investigations
Open Semantic Etl
Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & Named Entity Recognition) & data enrichment (annotation) pipelines & ingestor to Solr or Elastic search index & linked data graph database
Stars: ✭ 165 (+312.5%)
Mutual labels:  ocr, documents
Judge-Jury-and-Executable
A file system forensics analysis scanner and threat hunting tool. Scans file systems at the MFT and OS level and stores data in SQL, SQLite or CSV. Threats and data can be probed harnessing the power and syntax of SQL.
Stars: ✭ 66 (+65%)
Mutual labels:  forensics, forensics-investigations
papermerge-core
Papermerge RESTful backend structured as reusable Django app
Stars: ✭ 103 (+157.5%)
Mutual labels:  ocr, documents
Paperless
Scan, index, and archive all of your paper documents
Stars: ✭ 7,662 (+19055%)
Mutual labels:  ocr, documents
Open Paperless
Scan, index, and archive all of your paper documents (acquired by Mayan EDMS)
Stars: ✭ 2,538 (+6245%)
Mutual labels:  ocr, documents
Packrat
Live system forensic collector
Stars: ✭ 16 (-60%)
Mutual labels:  forensics, forensics-investigations
Palmprint-Recognition-in-the-Wild
No description or website provided.
Stars: ✭ 22 (-45%)
Mutual labels:  forensics, forensics-investigations
paperbase
Open source document organizer with automatic OCR and full text search
Stars: ✭ 21 (-47.5%)
Mutual labels:  ocr, documents
omynote
众山小笔记 - 集中管理你的读书笔记
Stars: ✭ 154 (+285%)
Mutual labels:  ocr
financial-forecast
Personal Financial Forecasting Model
Stars: ✭ 24 (-40%)
Mutual labels:  excel
vminspect
Tools for inspecting disk images
Stars: ✭ 25 (-37.5%)
Mutual labels:  forensics
btrfscue
Recover files from damaged BTRFS filesystems
Stars: ✭ 28 (-30%)
Mutual labels:  forensics
ExcelExport
Classes to generate Excel/CSV Report in ASP.NET Core
Stars: ✭ 39 (-2.5%)
Mutual labels:  excel
nodejs-nedb-excel
基于nodejs+webpack,以nosql轻量级嵌入式数据库nedb作为存储,页面渲染采用react+redux,样式框架为ant design,实现了excel表格上传导出以及可视化
Stars: ✭ 28 (-30%)
Mutual labels:  excel
MetadataRemover
Android App to remove images' metadata
Stars: ✭ 42 (+5%)
Mutual labels:  documents
DocTr
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Stars: ✭ 202 (+405%)
Mutual labels:  ocr
Reports.JS
Stimulsoft Reports.JS is a reporting tool for Node.js and JavaScript applications.
Stars: ✭ 33 (-17.5%)
Mutual labels:  excel

ingestors

ingestors extract useful information from documents of different types in a structured standard format. It retains folder structures across directories, compressed archives and emails. The extracted data is formatted as Follow the Money (FtM) entities, ready for import into Aleph, or processing as an object graph.

Supported file types:

  • Plain text
  • Images
  • Web pages, XML documents
  • PDF files
  • Emails (Outlook, plain text)
  • Archive files (ZIP, Rar, etc.)

Other features:

  • Extendable and composable using classes and mixins.
  • Generates FollowTheMoney objects to a database as result objects.
  • Lightweight worker-style support for logging, failures and callbacks.
  • Throughly tested.

Usage

Ingestors are usually called in the context of Aleph. In order to run them stand-alone, you can use the supplied docker compose environment. To enter a working container, run:

make build
make shell

Inside the shell, you will find the ingestors command-line tool. During development, it is convenient to call its debug mode using files present in the user's home directory, which is mounted at /host:

ingestors debug /host/Documents/sample.xlsx
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].