All Projects → armadito → armadito-mod-pdf

armadito / armadito-mod-pdf

Licence: GPL-3.0 License
Armadito module for PDF document analysis.

Programming Languages

c
50402 projects - #5 most used programming language
objective c
16641 projects - #2 most used programming language
Makefile
30231 projects
shell
77523 projects
M4
1887 projects

ARMADITO PDF ANALYZER

Build Status Coverity Scan Build Status

Armadito module PDF is an heuristic module for PDF documents analysis.

Copyright (C) Teclib', 2015, 2016

See Online documentation at : http://armadito-av.readthedocs.io/en/latest/

License : GPLv3 https://www.gnu.org/licenses/license-list.html#GNUGPLv3

What is it?

Armadito PDF analyzer is a module for PDF documents scanning that includes:

  • a PDF parser

  • an heuristic analyzer that computes the document confidence level

Licensing

Armadito PDF analyzer is licensed under the GPLv3 https://www.gnu.org/licenses/license-list.html#GNUGPLv3

Dependencies

miniz.c

FEATURES

==> Parsing <==

  • Remove PostScript comments in the content of the document.
  • Get PDF version in header (Ex: %PDF-1.7).
  • Get trailers and xref table or xref objects.
  • Get objects informations described in the document (reference, dictionary, type, stream, filters, etc).
  • Extract objects embedded in stream objects.
  • Decode object streams encoded with filters : FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode, CCITTFaxDecode

==> Analysis <==

  • Tests based on PDF document structure (accodring to PDF specifications):

    • Check the PDF header version (from version 1.1 to 1.7).
    • Check if the content of the document is encrypted.
    • Check that the document contains non-empty pages.
    • Check object collision in object declaration.
    • Check trailers format.
    • Check xref table and xref object.
    • Check the presence of malicious Postscript comments (which could cause parsing errors).
  • Tests based on PDF objects content:

    • Get potentially malicious active contents (JavaScripts, Embedded files, Forms, URI, etc.)
    • JavaScript content analysis (malicious keywords, pattern repetition, unicode strings, etc).
    • Info object content analysis (search potentially malicious strings).
    • Check if object dictionary is hexa obfuscated.

==> Notation <==

  • A suspicious coefficient is attributed to each test.
  • Calc the suspicious coefficient of the pdf document.

LIMITATIONS

  • Supported PDF versions are: %PDF-1.1 to %PDF-1.7.
  • PDF documents with encrypted content are not supported.
  • Removing comments is skipped for document > 2MB
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].