All Projects → ivbeg → docx2csv

ivbeg / docx2csv

Licence: BSD-3-Clause license
Extracts tables from .docx files and saves them as .csv or .xls files

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to docx2csv

Whatsapp-Chat-Exporter
A customizable Android and iPhone WhatsApp database parser that will give you the history of your WhatsApp conversations in HTML and JSON. Android Backup Crypt12, Crypt14 and Crypt15 supported.
Stars: ✭ 150 (+257.14%)
Mutual labels:  parsing
MLSA-Certificate-Automate
Automate your Microsoft Learn Student Ambassadors event certificate with Python
Stars: ✭ 24 (-42.86%)
Mutual labels:  docx
fyodor
Convert your Amazon Kindle highlights and notes into markdown (or any format).
Stars: ✭ 101 (+140.48%)
Mutual labels:  parsing
docxjs
Docx rendering library
Stars: ✭ 315 (+650%)
Mutual labels:  docx
YAPDFKit
Yet another PDF Kit for parsing and modifying PDF's. For OS X and iOS.
Stars: ✭ 27 (-35.71%)
Mutual labels:  parsing
CVparser
CVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (-33.33%)
Mutual labels:  parsing
FullFIX
A library for parsing FIX (Financial Information eXchange) protocol messages.
Stars: ✭ 60 (+42.86%)
Mutual labels:  parsing
ansicolor
A JavaScript ANSI color/style management. ANSI parsing. ANSI to CSS. Small, clean, no dependencies.
Stars: ✭ 91 (+116.67%)
Mutual labels:  parsing
domainatrex
😈 A library for parsing TLDs from urls in Elixir
Stars: ✭ 29 (-30.95%)
Mutual labels:  parsing
OpenSIEM-Logstash-Parsing
SIEM Logstash parsing for more than hundred technologies
Stars: ✭ 140 (+233.33%)
Mutual labels:  parsing
sb-dynlex
Configurable lexer for PHP featuring a fluid API.
Stars: ✭ 27 (-35.71%)
Mutual labels:  parsing
puppeteer-autoscroll-down
Handle infinite scroll on websites by puppeteer
Stars: ✭ 40 (-4.76%)
Mutual labels:  parsing
htmlparsing
htmlparsing.com, a website devoted to helping people parse HTML correctly
Stars: ✭ 29 (-30.95%)
Mutual labels:  parsing
logstash-config
logstash-config provides a parser and abstract syntax tree (AST) for the Logstash config format, written in Go
Stars: ✭ 26 (-38.1%)
Mutual labels:  parsing
opentbs
With OpenTBS you can merge OpenOffice - LibreOffice and Ms Office documents with PHP using the TinyButStrong template engine. Simple use OpenOffice - LibreOffice or Ms Office to edit your templates : DOCX, XLSX, PPTX, ODT, OSD, ODP and other formats. That is the Natural Template philosophy.
Stars: ✭ 48 (+14.29%)
Mutual labels:  docx
biaffine-ner
Named Entity Recognition as Dependency Parsing
Stars: ✭ 293 (+597.62%)
Mutual labels:  parsing
wallhaven4j
Wallhaven API for Java
Stars: ✭ 17 (-59.52%)
Mutual labels:  parsing
left-recursion
Quick explanation of eliminating left recursion in Haskell parsers
Stars: ✭ 36 (-14.29%)
Mutual labels:  parsing
pe
Fastest general-purpose parsing library for Python with a familiar API
Stars: ✭ 21 (-50%)
Mutual labels:  parsing
FAParser
JSON Parsing + Archiving & Unarchiving in User Defaults
Stars: ✭ 67 (+59.52%)
Mutual labels:  parsing

Command line

Usage: docx2csv extract [OPTIONS] FILENAME

docx to csv convertor (http://github.com/ivbeg/docx2csv) Extracts tables from DOCX files as CSV or XLSX.

Use command: "docx2csv convert <filename>" to run extraction. It will create files like filename_1.csv, filename_2.csv for each table found.

Options:
--format TEXT Output format: CSV, XLSX
--singlefile TEXT
 Outputs single XLS file with multiple sheets: True or False
--sizefilter INTEGER
 Filters table by size number of rows
--help Show this message and exit.

Examples

docx2csv extract --format csv --sizefilter 3 CP_CONTRACT_160166.docx

Extracts tables from file CP_CONTRACT_160166.docx with number of rows > 3 and saves results as CSV files.

Code

Function "extract_tables" returns list of tables from docx file and function "extract" extracts tables as xlsx, xls or csv file. If 'csv' file selected you need to specify single file csv or multiple files
>>> from docx2csv import extract_tables, extract
>>> tables = extract_tables('some_file.docx')

returns list of tables >>> extract(filename='some_file.docx', format="xlsx", output='some_file.xlsx') saves all tables from some_file.docx to some_file.xlsx

>>> extract(filename='some_file.docx', format="csv", singlefile=False)
saves all tables from some_file.docx to some_file_1.csv, some_file_2.csv and etc.

Requirements

Acknowledgements

Thanks to Vsevolod Oparin (https://www.facebook.com/vsevolod.oparin) for optimized "extract_table" code

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].