pandoraSmall box of pandora to prototype your app with ready for use backend. This is just my compilation of different solutions occasionally applied in hackathons and challenges
XponentsGeographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.
pdf2htmlpdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image for PDF file using Apache PDFBox.
tika-similarityTika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
GeoParserExtract and Visualize location from any file
image spaceInteractive Image similarity and Visual Search and Retrieval application