All Projects → WeTransfer → Format_parser

WeTransfer / Format_parser

Licence: other
file metadata parsing, done cheap

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to Format parser

Govips
A lightning fast image processing and resizing library for Go
Stars: ✭ 442 (+860.87%)
Mutual labels:  tiff, pdf, gif, png, jpeg
Libvips
A fast image processing library with low memory needs.
Stars: ✭ 6,094 (+13147.83%)
Mutual labels:  tiff, pdf, gif, png, jpeg
Imagesharp
📷 A modern, cross-platform, 2D Graphics library for .NET
Stars: ✭ 5,186 (+11173.91%)
Mutual labels:  gif, png, jpeg, exif
Sharp
High performance Node.js image processing, the fastest module to resize JPEG, PNG, WebP, AVIF and TIFF images. Uses the libvips library.
Stars: ✭ 21,131 (+45836.96%)
Mutual labels:  tiff, png, jpeg, exif
Pixterm
Draw images in your ANSI terminal with true color
Stars: ✭ 782 (+1600%)
Mutual labels:  tiff, gif, png, jpeg
Pyecharts Snapshot
renders the output of pyecharts as png, jpeg, gif, svg, eps, pdf and raw base64
Stars: ✭ 142 (+208.7%)
Mutual labels:  pdf, gif, png, jpeg
Metadata Extractor
Extracts Exif, IPTC, XMP, ICC and other metadata from image, video and audio files
Stars: ✭ 1,972 (+4186.96%)
Mutual labels:  tiff, png, jpeg, exif
Imageprocessor
📷 A fluent wrapper around System.Drawing for the processing of image files.
Stars: ✭ 2,452 (+5230.43%)
Mutual labels:  tiff, gif, png, jpeg
sail
The missing small and fast image decoding library for humans (not for machines) ⛵ https://sail.software
Stars: ✭ 206 (+347.83%)
Mutual labels:  png, jpeg, tiff, gif
Imaginary
Fast, simple, scalable, Docker-ready HTTP microservice for high-level image processing
Stars: ✭ 4,107 (+8828.26%)
Mutual labels:  gif, png, jpeg
Exifr
📷 The fastest and most versatile JS EXIF reading library.
Stars: ✭ 448 (+873.91%)
Mutual labels:  tiff, png, exif
Flyimg
Dockerized PHP7 application runs as a Microservice to resize and crop images on the fly. Get optimised images with MozJPEG, WebP or PNG using ImageMagick. Includes face detection, cropping, face blurring, image rotation and many other options. Abstract storage based on FlySystem in order to store images on any provider (local, AWS S3...).
Stars: ✭ 762 (+1556.52%)
Mutual labels:  gif, png, jpeg
Compress Images
Minify size your images. Image compression with extension: jpg/jpeg, svg, png, gif. NodeJs
Stars: ✭ 331 (+619.57%)
Mutual labels:  gif, png, jpeg
Libnyquist
🎤 Cross platform C++11 library for decoding audio (mp3, wav, ogg, opus, flac, etc)
Stars: ✭ 311 (+576.09%)
Mutual labels:  wav, mp3, flac
Photosauce
MagicScaler high-performance, high-quality image processing pipeline for .NET
Stars: ✭ 291 (+532.61%)
Mutual labels:  tiff, png, jpeg
Music Metadata
Stream and file based music metadata parser for node. Supporting a wide range of audio and tag formats.
Stars: ✭ 455 (+889.13%)
Mutual labels:  wav, mp3, flac
Leanify
lightweight lossless file minifier/optimizer
Stars: ✭ 694 (+1408.7%)
Mutual labels:  zip, png, jpeg
Exif Py
Easy to use Python module to extract Exif metadata from digital image files.
Stars: ✭ 561 (+1119.57%)
Mutual labels:  tiff, jpeg, exif
Sdwebimage
Asynchronous image downloader with cache support as a UIImageView category
Stars: ✭ 23,928 (+51917.39%)
Mutual labels:  gif, png, jpeg
Optimizt
CLI image optimization tool
Stars: ✭ 594 (+1191.3%)
Mutual labels:  gif, png, jpeg

format_parser

is a Ruby library for prying open video, image, document, and audio files. It includes a number of parser modules that try to recover metadata useful for post-processing and layout while reading the absolute minimum amount of data possible.

format_parser is inspired by imagesize, fastimage and dimensions, borrowing from them where appropriate.

Gem Version Build Status

Currently supported filetypes:

  • TIFF
  • CR2
  • PSD
  • PNG
  • MP3
  • JPEG
  • GIF
  • PDF
  • DPX
  • AIFF
  • WAV
  • FLAC
  • FDX
  • MOV
  • MP4
  • M4A
  • ZIP
  • DOCX, PPTX, XLSX
  • OGG
  • MPEG, MPG
  • M3U

...with more on the way!

Basic usage

Pass an IO object that responds to read, seek and size to FormatParser.parse and the first confirmed match will be returned.

match = FormatParser.parse(File.open("myimage.jpg", "rb"))
match.nature        #=> :image
match.format        #=> :jpg
match.display_width_px      #=> 320
match.display_height_px     #=> 240
match.orientation   #=> :top_left

You can also use parse_http passing a URL or parse_file_at passing a path:

match = FormatParser.parse_http('https://upload.wikimedia.org/wikipedia/commons/b/b4/Mardin_1350660_1350692_33_images.jpg')
match.nature        #=> :image
match.format        #=> :jpg

If you would rather receive all potential results from the gem, call the gem as follows:

array_of_results = FormatParser.parse(File.open("myimage.jpg", "rb"), results: :all)

You can also optimize the metadata extraction by providing hints to the gem:

FormatParser.parse(File.open("myimage", "rb"), natures: [:video, :image], formats: [:jpg, :png, :mp4], results: :all)

Return values of all parsers have built-in JSON serialization

img_info = FormatParser.parse(File.open("myimage.jpg", "rb"))
JSON.pretty_generate(img_info) #=> ...

To convert the result to a Hash or a structure suitable for JSON serialization

img_info = FormatParser.parse(File.open("myimage.jpg", "rb"))
img_info.as_json

# it's also possible to convert all keys to string
img_info.as_json(stringify_keys: true)

Creating your own parsers

See the section on writing parsers in CONTRIBUTING.md

Design rationale

We need to recover metadata from various file types, and we need to do so satisfying the following constraints:

  • The data in those files can be malicious and/or incomplete, so we need to be failsafe
  • The data will be fetched from a remote location (S3), so we want to obtain it with as few HTTP requests as possible
  • ...and with the amount of data fetched being small - the number of HTTP requests being of greater concern
  • The data can be recognized ambiguously and match more than one format definition (like TIFF sections of camera RAW)
  • The information necessary is a small subset of the overall metadata available in the file.
  • The number of supported formats is only ever going to increase, not decrease
  • The library is likely to be used in multiple consumer applications
  • The library is likely to be used in multithreading environments

Deliberate design choices

Therefore we adapt the following approaches:

  • Modular parsers per file format, with some degree of code sharing between them (but not too much). Adding new formats should be low-friction, and testing these format parsers should be possible in isolation
  • Modular and configurable IO stack that supports limiting reads/loops from the source entity. The IO stack is isolated from the parsers, meaning parsers do not need to care about things like fetches using Range: headers, GZIP compression and the like
  • A caching system that allows us to ideally fetch once, and only once, and as little as possible - but still accomodate formats that have the important information at the end of the file or might need information from the middle of the file
  • Minimal dependencies, and if dependencies are to be used they should be very stable and low-level
  • Where possible, use small subsets of full-feature format parsers since we only care about a small subset of the data.
  • When a choice arises between using a dependency or writing a small parser, write the small parser since less code is easier to verify and test, and we likely don't care about all the metadata anyway
  • Avoid using C libraries which are likely to contain buffer overflows/underflows - we stay memory safe

Acknowledgements

We are incredibly grateful to Remco van't Veer for exifr and to Krists Ozols for id3tag that we are using for crucial tasks.

Fixture Sources

Unless specified otherwise in this section the fixture files are MIT licensed and from the FastImage and Dimensions projects.

JPEG

  • divergent_pixel_dimensions_exif.jpg is used with permission from LiveKom GmbH
  • extended_reads.jpg has kindly been made available by Raphaelle Pellerin for use exclusively with format_parser
  • too_many_APP1_markers_surrogate.jpg was created by the project maintainers

AIFF

  • fixture.aiff was created by one of the project maintainers and is MIT licensed

WAV

  • c_11k16bitpcm.wav and c_8kmp316.wav are from Wikipedia WAV, retrieved January 7, 2018
  • c_39064__alienbomb__atmo-truck.wav is from freesound and is CC0 licensed
  • c_M1F1-Alaw-AFsp.wav and d_6_Channel_ID.wav are from a McGill Engineering site

MP3

  • Cassy.mp3 has been produced by WeTransfer and may be used with the library for the purposes of testing

FDX

  • fixture.fdx was created by one of the project maintainers and is MIT licensed

DPX

  • DPX files were created by one of the project maintainers and may be used with the library for the purposes of testing

MOOV

  • bmff.mp4 is borrowed from the bmff project
  • Test_Circular MOV files were created by one of the project maintainers and are MIT licensed

CR2

FLAC

  • atc_fixture_vbr.flac is a converted version of the MP3 with the same name
  • c_11k16btipcm.flac is a converted version of the WAV with the same name

OGG

  • hi.ogg, vorbis.ogg, with_confusing_magic_string.ogg, with_garbage_at_the_end.ogg have been generated by the project contributors

M4A

  • fixture.m4a was created by one of the project maintainers and is MIT licensed

PNG

TIFF

  • Shinbutsureijoushuincho.tiff is obtained from Wikimedia Commons and is Creative Commons licensed
  • IMG_9266_*.tif and all it's variations were created by the project maintainers

ARW

ZIP

  • The .zip fixture files have been created by the project maintainers

.docx

  • The .docx files were generated by the project maintainers

.mpg and .mpeg

JPEG examples of EXIF orientation

M3U

  • The M3U fixture files were created by one of the project maintainers

.key

  • The keynote_recognized_as_jpeg.key file was created by the project maintainers

Copyright

Copyright (c) 2020 WeTransfer.

format_parser is distributed under the conditions of the Hippocratic License

  • See LICENSE.txt for further details.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].