Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Image process framework based on plugin like imagej, it is esay to glue with scipy.ndimage, scikit-image, opencv, simpleitk, mayavi...and any libraries based on numpy

Stars: ✭ 1,026 (+1952%)

Mutual labels: opencv

Facer

Simple (🤞) face averaging (🙂) in Python (🐍)

Stars: ✭ 49 (-2%)

Mutual labels: opencv

Convolutionalemotion

A deep convolutional neural network system for live emotion detection

Stars: ✭ 40 (-20%)

Mutual labels: opencv

Table Tennis Computer Vision

Apply computer vision to table tennis for match / training analysis

Stars: ✭ 48 (-4%)

Mutual labels: opencv

Realtimefaceapi

This is a demo project showing how to use Face API in Cognitive Services with OpenCV

Stars: ✭ 44 (-12%)

Mutual labels: opencv

Table Builder

🐿️Dynamic tables with pagination and sorting for data visualisation.

Stars: ✭ 44 (-12%)

Mutual labels: table

Sikulix1

SikuliX version 2.0.0+ (2019+)

Stars: ✭ 1,007 (+1914%)

Mutual labels: opencv

Android Hpe

Android native application to perform head pose estimation using images coming from the front camera.

Stars: ✭ 46 (-8%)

Mutual labels: opencv

Fliplog

fluent logging with verbose insight, colors, tables, emoji, filtering, spinners, progress bars, timestamps, capturing, stack traces, tracking, presets, & more...

Stars: ✭ 41 (-18%)

Mutual labels: tables

Seeds Revised

Implementation of the superpixel algorithm called SEEDS [1].

Stars: ✭ 48 (-4%)

Mutual labels: opencv

Hacking Scripts

Hacking Scripts contains amazing and awesome scripts written in Python, JavaScript, Java, Nodejs, and more. The main aim of the repository will be to provide utility scripts that might make everyday life easy.

Stars: ✭ 41 (-18%)

Mutual labels: opencv

Opencvdeviceenumerator

This repository contains a class that allows the enumeration of video and audio devices in order to get the device IDs that are required to create a VideoCapture object inside OpenCV (in Windows).

Stars: ✭ 48 (-4%)

Mutual labels: opencv

Xamarin.ios Opencv

OpenCV for Xamarin.iOS

Stars: ✭ 43 (-14%)

Mutual labels: opencv

Plant Detection

Detects and marks plants in a soil area image using Python OpenCV

Stars: ✭ 43 (-14%)

Mutual labels: opencv

Fingerprint Feature Extraction

Extract minutiae features from fingerprint images

Stars: ✭ 45 (-10%)

Mutual labels: opencv

View All Similar Projects ➔

= PDF-table :toc:

== What is PDF-table? PDF-table is Java utility library that can be used for parsing tabular data in PDF documents. + Core processing of PDF documents is performed with utilization of Apache PDFBox and OpenCV.

== Prerequisites

=== JDK

JAVA 8 is required.

=== External dependencies

pdf-table requires compiled OpenCV 3.4.2 to work properly:

. Download OpenCV v3.4.2 from https://github.com/opencv/opencv/releases/tag/3.4.2 . Unpack it and add to your system PATH: * Windows: <opencv dir>\build\java\x64 * Linux: TODO

== Installation [source, xml]

com.github.rostrovsky pdf-table 1.0.0 ----

== Usage

=== Parsing PDFs When PDF document page is being parsed, following operations are performed:

. Page is converted to grayscale image [OpenCV]. . Binary Inverted Threshold (BIT) is applied to grayscaled image [OpenCV]. . Contours are detected on BIT image and contour mask is created (additional Canny filtering can be turned on in this step) [OpenCV]. . Contour mask is XORed with BIT image [OpenCV]. . Contours are detected once again on XORed image (additional Canny filtering can be turned on in this step) [OpenCV]. . Final contours are drawn [OpenCV]. . Bounding rectangles are detected from final contours [OpenCV]. . PDF is being parsed region-by-region using bounding rectangles coordinates [Apache PDFBox].

Above algorithm is mostly derived from http://stackoverflow.com/a/23106594.

For more information about parsed output, refer to <>

==== single-threaded example [source, java]

class SingleThreadParser { public static void main(String[] args) throws IOException { PDDocument pdfDoc = PDDocument.load(new File("some.pdf")); PdfTableReader reader = new PdfTableReader(); List parsed = reader.parsePdfTablePages(pdfDoc, 1, pdfDoc.getNumberOfPages()); } }

==== multi-threaded example [source, java]

class MultiThreadParser { public static void main(String[] args) throws IOException { final int THREAD_COUNT = 8; PDDocument pdfDoc = PDDocument.load(new File("some.pdf")); PdfTableReader reader = new PdfTableReader();

    // parse pages simultaneously
    ExecutorService executor = Executors.newFixedThreadPool(THREAD_COUNT);
    List<Future<ParsedTablePage>> futures = new ArrayList<>();
    for (final int pageNum : IntStream.rangeClosed(1, pdfDoc.getNumberOfPages()).toArray()) {
        Callable<ParsedTablePage> callable = () -> {
            ParsedTablePage page = reader.parsePdfTablePage(pdfDoc, pageNum);
            return page;
        };
        futures.add(executor.submit(callable));
    }

    // collect parsed pages
    List<ParsedTablePage> unsortedParsedPages = new ArrayList<>(pdfDoc.getNumberOfPages());
    try {
        for (Future<ParsedTablePage> f : futures) {
            ParsedTablePage page = f.get();
            unsortedParsedPages.add(page.getPageNum() - 1, page);
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }

    // sort pages by pageNum
    List<ParsedTablePage> sortedParsedPages = unsortedParsedPages.stream()
            .sorted((p1, p2) -> Integer.compare(p1.getPageNum(), p2.getPageNum())).collect(Collectors.toList());
}

}

=== Saving PDF pages as PNG images PDF-Table provides methods for saving PDF pages as PNG images. + Rendering DPI can be modified in PdfTableSettings (see: <>).

==== single-threaded example [source, java]

class SingleThreadPNGDump { public static void main(String[] args) throws IOException { PDDocument pdfDoc = PDDocument.load(new File("some.pdf")); Path outputPath = Paths.get("C:", "some_directory"); PdfTableReader reader = new PdfTableReader(); reader.savePdfPagesAsPNG(pdfDoc, 1, pdfDoc.getNumberOfPages(), outputPath); } }

==== multi-threaded example [source, java]

class MultiThreadPNGDump { public static void main(String[] args) throws IOException { final int THREAD_COUNT = 8; Path outputPath = Paths.get("C:", "some_directory"); PDDocument pdfDoc = PDDocument.load(new File("some.pdf")); PdfTableReader reader = new PdfTableReader();

    ExecutorService executor = Executors.newFixedThreadPool(THREAD_COUNT);
    List<Future<Boolean>> futures = new ArrayList<>();
    for (final int pageNum : IntStream.rangeClosed(1, pdfDoc.getNumberOfPages()).toArray()) {
        Callable<Boolean> callable = () -> {
            reader.savePdfPageAsPNG(pdfDoc, pageNum, outputPath);
            return true;
        };
        futures.add(executor.submit(callable));
    }

    try {
        for (Future<Boolean> f : futures) {
            f.get();
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

}

=== Saving debug PNG images When tables in PDF document cannot be parsed correctly with default settings, user can save debug images that show page at various stages of processing. + Using these images, user can adjust PdfTableSettings accordingly to achieve desired results (see: <>).

==== single-threaded example [source, java]

class SingleThreadDebugImgsDump { public static void main(String[] args) throws IOException { PDDocument pdfDoc = PDDocument.load(new File("some.pdf")); Path outputPath = Paths.get("C:", "some_directory"); PdfTableReader reader = new PdfTableReader(); reader.savePdfTablePagesDebugImages(pdfDoc, 1, pdfDoc.getNumberOfPages(), outputPath); } }

==== multi-threaded example [source, java]

class MultiThreadDebugImgsDump { public static void main(String[] args) throws IOException { final int THREAD_COUNT = 8; Path outputPath = Paths.get("C:", "some_directory"); PDDocument pdfDoc = PDDocument.load(new File("some.pdf")); PdfTableReader reader = new PdfTableReader();

    ExecutorService executor = Executors.newFixedThreadPool(THREAD_COUNT);
    List<Future<Boolean>> futures = new ArrayList<>();
    for (final int pageNum : IntStream.rangeClosed(1, pdfDoc.getNumberOfPages()).toArray()) {
        Callable<Boolean> callable = () -> {
            reader.savePdfTablePagesDebugImage(pdfDoc, pageNum, outputPath);
            return true;
        };
        futures.add(executor.submit(callable));
    }

    try {
        for (Future<Boolean> f : futures) {
            f.get();
        }
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

}

=== Parsing settings

PDF rendering and OpenCV filtering settings are stored in PdfTableSettings object.

Custom settings instance can be passed to PdfTableReader constructor when non-default values are needed:

[source, java]

(...)

// build settings object PdfTableSettings settings = PdfTableSettings.getBuilder() .setCannyFiltering(true) .setCannyApertureSize(5) .setCannyThreshold1(40) .setCannyThreshold2(190.5) .setPdfRenderingDpi(160) .build();

// pass settings to reader PdfTableReader reader = new PdfTableReader(settings);

=== Output format Each parsed PDF page is being returned as `ParsedTablePage` object: [source, java]

(...)

PDDocument pdfDoc = PDDocument.load(new File("some.pdf")); PdfTableReader reader = new PdfTableReader();

// first page in document has index == 1, not 0 ! ParsedTablePage firstPage = reader.parsePdfTablePage(pdfDoc, 1);

// getting page number assert firstPage.getPageNum() == 1;

// rows and cells are zero-indexed just like elements of the List // getting first row ParsedTablePage.ParsedTableRow firstRow = firstPage.getRow(0);

// getting third cell in second row String thirdCellContent = firstPage.getRow(1).getCell(2);

// cell content usually contain characters, // so it is recommended to trim them before processing double thirdCellNumericValue = Double.valueOf(thirdCellContent.trim());

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 50

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

rostrovsky / Pdf Table

Programming Languages

Labels

Projects that are alternatives of or similar to Pdf Table

== Installation [source, xml]

==== single-threaded example [source, java]

class SingleThreadParser { public static void main(String[] args) throws IOException { PDDocument pdfDoc = PDDocument.load(new File("some.pdf")); PdfTableReader reader = new PdfTableReader(); List parsed = reader.parsePdfTablePages(pdfDoc, 1, pdfDoc.getNumberOfPages()); } }

==== multi-threaded example [source, java]

}

==== single-threaded example [source, java]

==== multi-threaded example [source, java]

}

==== single-threaded example [source, java]

==== multi-threaded example [source, java]

}

[source, java]

// pass settings to reader PdfTableReader reader = new PdfTableReader(settings);

=== Output format Each parsed PDF page is being returned as ParsedTablePage object: [source, java]

// cell content usually contain characters, // so it is recommended to trim them before processing double thirdCellNumericValue = Double.valueOf(thirdCellContent.trim());

=== Output format Each parsed PDF page is being returned as `ParsedTablePage` object: [source, java]