All Projects β†’ dilawar β†’ PlotDigitizer

dilawar / PlotDigitizer

Licence: GPL-3.0 license
A Python utility to digitize plots.

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to PlotDigitizer

refinery
Refinery is a tool to extract and transform semi-structured data from Excel spreadsheets of different layouts in a declarative way.
Stars: ✭ 30 (-53.12%)
Mutual labels:  data-extraction
wiktionary-de-parser
Extract data from German Wiktionary XML files. Allows you to add your own extraction methods πŸš€
Stars: ✭ 22 (-65.62%)
Mutual labels:  data-extraction
audio-digitization-toolkit
A list of resources for setting up an audio digitization workflow
Stars: ✭ 13 (-79.69%)
Mutual labels:  digitization
optimus
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+2010.94%)
Mutual labels:  data-extraction
Flashtext
Extract Keywords from sentence or Replace keywords in sentences.
Stars: ✭ 5,012 (+7731.25%)
Mutual labels:  data-extraction
kick-off-web-scraping-python-selenium-beautifulsoup
A tutorial-based introduction to web scraping with Python.
Stars: ✭ 18 (-71.87%)
Mutual labels:  data-extraction
newspaper3 usage overview
This repository provides usage examples for the Python module Newspaper3k.
Stars: ✭ 78 (+21.88%)
Mutual labels:  data-extraction
sypht-golang-client
A Golang client for the Sypht API
Stars: ✭ 33 (-48.44%)
Mutual labels:  data-extraction
Table-Extractor-From-Image
This repository contains the code that extracts a table from an image and exports it to an Excel.
Stars: ✭ 46 (-28.12%)
Mutual labels:  data-extraction
kitodo-production
Kitodo.Production
Stars: ✭ 52 (-18.75%)
Mutual labels:  digitization

Python application PyPI version DOI

A Python3 command line utility to digitize plots

This utility is useful when you have a lot of similar plots that needs to be digitized such as EEG, ECG recordings. See examples below.

Feel free to contact me for commercial work that may require optimizing this pipeline for your use case. Please send a sample plot.

For occasional use, have a look at WebPlotDigitizer by Ankit Rohatagi.

Installation

$ python3 -m pip install plotdigitizer
$ plotdigitizer --help

Preparing image

Crop the image and leave only axis and trajectories. I use gthumb utility on Linux. You can also use imagemagick or gimp.

Following image is from MacFadden and Koshland, PNAS 1990 after trimming. One can also remove top and right axis.

Trimmed image

Run

plotdigitizer ./figures/trimmed.png -p 0,0 -p 10,0 -p 0,1

We need at least three points (-p option) to map axes onto the image. In the example above, these are 0,0 (where x-axis and y-axis intesect) , 10,0 (a point on x-axis) and 0,1 (a point on y-axis). To map these points on the image, you will be asked to click on these points on the image. Make sure to click in the same order and click on the points as precisely as you could. Any error in this step will propagate. If you don't have 0,0 in your image, you have to provide 4 points: 2 on x-axis and 2 on y-axis.

The data-points will be dumped to a csv file specified by --output /path/to/file.csv.

If --plot output.png is passed, a plot of the extracted data-points will be saved to output.png. This requires matplotlib. Very useful when debugging/testing.

Notice the error near the right y-axis.

Using in batch mode

You can pass the coordinates of points in the image at the command prompt. This allows to run in the batch mode without any need for the user to click on the image.

plotdigitizer ./figures/trimmed.png -p 0,0 -p 20,0 -p 0,1 -l 22,295 -l 142,295 -l 22,215 --plot output.png

How to find coordinates of axes points

In the example above, point 0,0 is mapped to coordinate 22,295 i.e., the data point 0,0 is on the 22nd row and 295th column of the image (assuming that bottom left of the image is first row, first column (0,0)). I have included an utility plotdigitizer-locate (script plotdigitizer/locate.py) which you can use to find the coordinates of points.

$ plotdigitizer-locate figures/trimmed.png

or, by directly using the script:

$ python3 plotdigitizer/locate.py figures/trimmed.png

This command opens the image in a simple window. You can click on a point and its coordinate will be written on the image itself. Note them down.

Examples

Base examples

plotdigitizer figures/graphs_1.png \
		-p 1,0 -p 6,0 -p 0,3 \
		-l 165,160 -l 599,160 -l 85,60 \
		--plot figures/graphs_1.result.png \
		--preprocess

original reconstructed

Light grids

plotdigitizer  figures/ECGImage.png \
		-p 1,0 -p 5,0 -p 0,1 \
        -l 290,337 -l 1306,338 -l 106,83 \
		--plot figures/ECGImage.result.png

original reconstructed

With grids

plotdigitizer  figures/graph_with_grid.png \
		-p 200,0 -p 1000,0 -p 0,50 \
        -l 269,69 -l 1789,69 -l 82,542 \
		--plot figures/graph_with_grid.result.png

original Image credit: Yang yi, Wang

reconstructed

Note that legend was not removed in the original figure and it has screwed up the detection below it.

Limitations

Currently this script has following limitations:

  • Background must not be transparent. It might work with transparent background but I've not tested it.
  • Only b/w images are supported for now. Color images will be converted to grayscale upon reading.
  • One image should have only one trajectory.

Need help

Open an issue and please attach the sample plot.

Related projects by others

  1. WebPlotDigitizer by Ankit Rohatagi is very versatile.

Notes

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].