All Projects → tammoippen → xls2json

tammoippen / xls2json

Licence: BSD-3-Clause license
Read in Excel file (.xls, .xlsx, .xlsm) and output JSON.

Programming Languages

kotlin
9241 projects
Makefile
30231 projects
Dockerfile
14818 projects

Projects that are alternatives of or similar to xls2json

Documentserver
ONLYOFFICE Document Server is an online office suite comprising viewers and editors for texts, spreadsheets and presentations, fully compatible with Office Open XML formats: .docx, .xlsx, .pptx and enabling collaborative editing in real time.
Stars: ✭ 2,335 (+5887.18%)
Mutual labels:  xlsx, xls
Excelmapper
Map POCO objects to Excel files
Stars: ✭ 166 (+325.64%)
Mutual labels:  xlsx, xls
Tableexport
The simple, easy-to-implement library to export HTML tables to xlsx, xls, csv, and txt files.
Stars: ✭ 781 (+1902.56%)
Mutual labels:  xlsx, xls
Readxl
Read excel files (.xls and .xlsx) into R 🖇
Stars: ✭ 585 (+1400%)
Mutual labels:  xlsx, xls
Android Gradle Localization Plugin
Gradle plugin for generating localized string resources
Stars: ✭ 100 (+156.41%)
Mutual labels:  xlsx, xls
Sheetjs
📗 SheetJS Community Edition -- Spreadsheet Data Toolkit
Stars: ✭ 28,479 (+72923.08%)
Mutual labels:  xlsx, xls
Desktopeditors
An office suite that combines text, spreadsheet and presentation editors allowing to create, view and edit local documents
Stars: ✭ 1,008 (+2484.62%)
Mutual labels:  xlsx, xls
J
❌ Multi-format spreadsheet CLI (now merged in http://github.com/sheetjs/js-xlsx )
Stars: ✭ 343 (+779.49%)
Mutual labels:  xlsx, xls
Xresloader
跨平台Excel导表工具(Excel=>protobuf/msgpack/lua/javascript/json/xml)
Stars: ✭ 161 (+312.82%)
Mutual labels:  xlsx, xls
Myexcel
MyExcel, a new way to operate excel!
Stars: ✭ 1,198 (+2971.79%)
Mutual labels:  xlsx, xls
Docjure
Read and write Office documents from Clojure
Stars: ✭ 510 (+1207.69%)
Mutual labels:  xlsx, xls
Test files
📚 SheetJS Test Files (XLS/XLSX/XLSB and other spreadsheet formats)
Stars: ✭ 150 (+284.62%)
Mutual labels:  xlsx, xls
Easyexcel
快速、简单避免OOM的java处理Excel工具
Stars: ✭ 22,133 (+56651.28%)
Mutual labels:  xlsx, xls
Rows
A common, beautiful interface to tabular data, no matter the format
Stars: ✭ 739 (+1794.87%)
Mutual labels:  xlsx, xls
Npoi.mapper
Use this tool to import or export data with Excel file. The tool is a convention based mapper between strong typed object and Excel data via NPOI.
Stars: ✭ 348 (+792.31%)
Mutual labels:  xlsx, xls
Pyexcel
Single API for reading, manipulating and writing data in csv, ods, xls, xlsx and xlsm files
Stars: ✭ 902 (+2212.82%)
Mutual labels:  xlsx, xls
grate
A Go native tabular data extraction package. Currently supports .xls, .xlsx, .csv, .tsv formats.
Stars: ✭ 98 (+151.28%)
Mutual labels:  xlsx, xls
Msoffcrypto Tool
Python tool and library for decrypting MS Office files with passwords or other keys
Stars: ✭ 274 (+602.56%)
Mutual labels:  xlsx, xls
Documentbuilder
ONLYOFFICE Document Builder is powerful text, spreadsheet, presentation and PDF generating tool
Stars: ✭ 61 (+56.41%)
Mutual labels:  xlsx, xls
Phpspreadsheet
A pure PHP library for reading and writing spreadsheet files
Stars: ✭ 10,627 (+27148.72%)
Mutual labels:  xlsx, xls

CI codecov

xls2json

Read in Excel file (.xls, .xlsx, .xlsm) and output JSON. Evaluates formulas where possible. Preserve type information from Excel via JSON types.

❯ xls2json --help
Open an xls(x|m) file and transform to json.

Usage: xls2json [-hlmsvV] [--[no-]color] [--pretty] [-D=<dtfmt>]
                [-p=<password>] [-T=<tfmt>] [-t=<tables>]... [<files>...]

  If no `--table`s are provided, then all
  tables will be extracted.

  All files will be processed one by one,
  each outputting on line of json.

      [<files>...]           xls(x|m)-file(s) to transform
  -h, --help                 Show this help message and exit.
  -V, --version              Print version information and exit.
  -m, --memory               Show memory usage information.
  -v, --verbose              Show more information.
      --pretty               Pretty print the JSON.
      --[no-]color           Force adding or removing of ansi-color to
                               pretty-printed JSON.
  -l, --list-tables          List all tables.
  -t, --table=<tables>       Specify the tables to transform
  -p, --password=<password>  Password for opening the input file(s).
  -s, --strip                Strip empty columns and empty rows.
  -D, --datetime-format=<dtfmt>
                             The datetime format.
                             [default: 'yyyy-MM-dd'T'HH:mm:ss.SSS']
  -T, --time-format=<tfmt>   The time format.
                             [default: 'HH:mm:ss.SSS']

By Tammo Ippen <[email protected]>
Issues: https://github.com/tammoippen/xls2json/issues

Usage

# read a XLS file
❯ xls2json src/test/resources/empty.xls
{"Sheet1":[]}

# read a XLS file with content
❯ xls2json src/test/resources/sample.xls
{"Sheet1":[["empty",null],["String","hello",null],["StringNumber","14.8",null],["Int",1234,null,null,null,null],["bool",true,null,null,null,null],["bool",false,null,null,null,null],["float",23.12345,null,null,null,null],["datetime","2021-05-18T21:19:53.040",null,null,null,null],["time","21:19:32.000",null],["formulars",null],["string","hello",null],["float",-0.34678748622465627,null],["int",5,null],["datetime","2021-06-03T16:45:56.709",null],["time","13:37:00.000",null],[],[],[],[null,null,null],[null,null,null],[null,null,null]]}

# strip empty cells from from the end of columns and empty rows from the bottom
❯ xls2json -s src/test/resources/sample.xls
{"Sheet1":[["empty"],["String","hello"],["StringNumber","14.8"],["Int",1234],["bool",true],["bool",false],["float",23.12345],["datetime","2021-05-18T21:19:53.040"],["time","21:19:32.000"],["formulars"],["string","hello"],["float",-0.34678748622465627],["int",5],["datetime","2021-06-03T16:47:22.789"],["time","13:37:00.000"]]}

# pretty print the output
❯ xls2json -s --pretty src/test/resources/sample.xls
{
  # we have a dict sheetname -> list of rows
  "Sheet1" : [
    # each row is a list of cell-values
    [ "empty" ],
    [ "String", "hello" ],
    [ "StringNumber", "14.8" ],
    [ "Int", 1234 ],
    [ "bool", true ],
    [ "bool", false ],
    [ "float", 23.12345 ],
    # format for datetime and time can be specified using
    # the -D and -T options. The format-string documentation:
    # https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/time/format/DateTimeFormatter.html
    # Please be aware, that Excel dates have no timezone attached,
    # so if you provide a format with timezone, your locally configured
    [ "datetime", "2021-05-18T21:19:53.040" ],
    [ "time", "21:19:32.000" ],
    # formulars are evaluated
    [ "formulars" ],
    [ "string", "hello" ],
    [ "float", -0.34678748622465627 ],
    [ "int", 5 ],
    [ "datetime", "2021-06-09T21:30:44.166" ],
    [ "time", "13:37:00.000" ]
  ]
}


# use jq (https://stedolan.github.io/jq/)
# or gojq (https://github.com/itchyny/gojq)
# to get some nice output and / or to process the json
❯ xls2json -s src/test/resources/sample.xls | jq ".Sheet1[2]"
[
  "StringNumber",
  "14.8"
]

# XLSX works the same
❯ xls2json -s src/test/resources/sample.xlsx
{"Sheet1":[["empty"],["String","hello"],["StringNumber","14.8"],["Int",1234],["bool",true],["bool",false],["float",23.12345],["datetime","2021-05-18T21:19:53.040"],["time","21:19:32.000"],["formulars"],["string","hello"],["float",-0.34678748622465627],["int",5],["datetime","2021-06-03T16:59:06.466"],["time","13:37:00.000"]]}

# works on multiple files as well
❯ xls2json -s src/test/resources/sample.xls*
{"Sheet1":[["empty"],["String","hello"],["StringNumber","14.8"],["Int",1234],["bool",true],["bool",false],["float",23.12345],["datetime","2021-05-18T21:19:53.040"],["time","21:19:32.000"],["formulars"],["string","hello"],["float",-0.34678748622465627],["int",5],["datetime","2021-06-03T16:59:06.466"],["time","13:37:00.000"]]}
{"Sheet1":[["empty"],["String","hello"],["StringNumber","14.8"],["Int",1234],["bool",true],["bool",false],["float",23.12345],["datetime","2021-05-18T21:19:53.040"],["time","21:19:32.000"],["formulars"],["string","hello"],["float",-0.34678748622465627],["int",5],["datetime","2021-06-03T16:47:22.789"],["time","13:37:00.000"]]}

# list the available tables
❯ xls2json -l src/test/resources/sampleTwoSheets.xls
["Sheet1","Sheet2"]

# only output some table(s)
❯ xls2json -t Sheet2 -s --pretty src/test/resources/sampleTwoSheets.xls
{
  "Sheet2" : [
    [ "empty" ],
    [ "String", "hello" ],
    [ "StringNumber", "14.8" ],
    [ "Int", 1234 ],
    [ "bool", true ],
    [ "bool", false ],
    [ "float", 23.12345 ],
    [ "datetime", "2021-05-18T21:19:53.040" ],
    [ "time", "21:19:32.000" ],
    [ "formulars" ],
    [ "string", "hello" ],
    [ "float", -0.34678748622465627 ],
    [ "int", 5 ],
    [ "datetime", "2021-06-09T21:37:05.615" ],
    [ "time", "13:37:00.000" ]
  ]
}

# if you are running the native-image executable, you can configure the
# garbage collector with the usual options
❯ xls2json -Xmx1k --help
Exception in thread "main" java.lang.OutOfMemoryError: Garbage-collected heap size exceeded.
# 1kb is a bit low

# if you are running with the installable distribution, you can set
# the environment variable XLS2JSON_OPTS
❯ XLS2JSON_OPTS="-Xmx1k" xls2json --help
Error occurred during initialization of VM
Too small maximum heap
# still 1kb is too small :D

Output JsonSchema

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "minProperties": 1,
  "patternProperties": {
    ".*": {
      "type": "array",
      "items": {
        "type": "array",
        "items": {
          "type": ["number", "string", "integer", "boolean", "null"]
        }
      }
    }
  }
}

Installation

The Releases contain various formats for installation:

  • An executable fat jar (build with the shadowJar plugin). Put the jar wherever you like and call it like:

    ❯ java -jar path/to/xls2json-1.0.0-all.jar --help
  • An installable distribution as .zip / .tar.gz. Extract it, put it wherever you like and update your PATH, e.g.:

    ❯ tar xf xls2json-1.0.0.tar
    ❯ mv xls2json-1.0.0 "$HOME/.local/share"export PATH="$PATH:$HOME/.local/share/xls2json-1.0.0/bin"
    ❯ xls2json --help
  • Native executable build with native-image from GraalVM using the native-image plugin from mike_neck. Put the executable for your operating system into your path and call it like: xls2json --help.

    No JAVA installation needed.

Another Excel Reader?

There are already some great Excel readers out there, but most do not satisfy my requirements:

  • commandline program with (in the best case) no external dependencies
  • XLS (2003 and earlier) and XLSX (2007 and later) support
  • evaluate formulas within the spreadsheet
  • keep type information from Excel
  • open source
  • (fast and multi-platform)

The existing programs / libraries I found and why I think they do not satisfy my needs:

  • pyexcel-cli: can do much more in and output formats, but lacks formula evaluation. Same for the corresponding pyexcel library.
  • unoconv and libreoffice: can do much more in and output formats, can do formula evaluation, but requires a LibreOffice installation.
  • ssconvert: requires a gnumeric installation, I am not sure about the capabilities.
  • Spreadsheet::XLSX: no cli afaik, xlsx only. Also perl is not a language I am eager to learn.
  • PhpSpreadsheet: no cli afaik, but from the docs looks as if you can evaluate formulas. But again, PHP is not a language I am eager to learn.

I am already using Apache POI for quite some time and found it quite complete for my needs. We still have the Java VM dependency, but with GraalVM native-image we can build self-contained executables 🎉.

Standing on the shoulders of giants

  • Apache POI: the Java API for Microsoft Documents
  • Picocli: a mighty tiny command line interface
  • jackson: Jackson has been known as "the Java JSON library" or "the best JSON parser for Java". Or simply as "JSON for Java".
  • Gradle Build Tool
  • kotlin: A modern programming language that makes developers happier.
  • GraalVM native-image: ahead-of-time compile Java code to a standalone executable

Some linkes I found invaluable for this project:

Building and Development

Prerequisits:

  • Make sure you have a Java (11+) SDK in your path.
  • If you want to build the native binary, use the GraalVM as your Java SDK and install native-image: gu install native-image

Building:

❯ git clone https://github.com/tammoippen/xls2json.git
❯ cd xls2json
# build all java related targets
# => build/distributions/xls2json-{version}.[tar|zip]
❯ ./gradlew build
# build the native executable for your system
# => build/executable/xls2json
❯ ./gradlew nativeImage
# build the fat jar only (already in build)
# => build/libs/xls2json-{version}-all.jar
❯ ./gradlew shadowJar
# tests and run
❯ ./gradlew test
❯ ./gradlew run --args="--help"

Issues with the native-image executables

Apache POI uses many resources and reflections when working with the excel files and the native-image executable needs to be configured during building to include those resources and reflections.

When issues arise reading an Excel file, please try to use the fat-jar or the installable distribution and run the Excel file with that. If it works, please open an issue with the failing Excel file (or a minimal reproducing Excel file). I will generate the configuration then. If it still does not work, please again open an issue including the Excel file, stack trace, xls2json and java version.

If you want to generate the configuration, consider running:

❯ ./gradlew shadowJar
❯ $(JAVA_HOME)/bin/java -agentlib:native-image-agent=config-merge-dir=native-image-config \
    -jar build/libs/xls2json-1.0.0-all.jar \
    the-problematic-excel-file.xls(x)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].