All Projects → rdclark → pptx2html

rdclark / pptx2html

Licence: MIT license
PowerPoint OOXML (2007) to HTML conversion via ANTLR

Programming Languages

java
68154 projects - #9 most used programming language
ANTLR
299 projects

This project takes PowerPoint .pptx files and extracts their contents. It's based on ANTLR 4, ANother Tool for Language Recognition. There's an ANTLR 3 branch available as well.

Limitations

  1. This version does not preserve text formatting or slide layouts.
  2. This version ignores shapes drawn with PowerPoint (that's a complex little drawing language) and might not catch all pictures.
  3. The output is HTML formatted for a s6 slideshow.

Building

Intall Maven and JDK 6 or later, build using the standard Maven lifecycle targets (clean, compile, test, package).

Wishlist / roadmap

  1. Other output templates (e.g. Markdown, Textile)
  2. Capture inline formatting
  3. Capture more of the layout options (titles, header/footer, text block positioning, picture positioning.)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].