All Projects → tafia → Calamine

tafia / Calamine

Licence: mit
A pure Rust Excel/OpenDocument SpeadSheets file reader: rust on metal sheets

Programming Languages

rust
11053 projects
vba
158 projects

Projects that are alternatives of or similar to Calamine

Corexlsx
Excel spreadsheet (XLSX) format parser written in pure Swift
Stars: ✭ 481 (-25.31%)
Mutual labels:  excel, parser
Zipcelx
Turns JSON data into `.xlsx` files in the browser
Stars: ✭ 246 (-61.8%)
Mutual labels:  excel, parser
Xlsxir
Xlsx parser for the Elixir language.
Stars: ✭ 167 (-74.07%)
Mutual labels:  excel, parser
Formula Parser
Javascript Library parsing Excel Formulas and more
Stars: ✭ 544 (-15.53%)
Mutual labels:  excel, parser
Pigeon
Command pigeon generates parsers in Go from a PEG grammar.
Stars: ✭ 603 (-6.37%)
Mutual labels:  parser
Goconfig
Package goconfig is a fully functional and comments-support configuration file (.ini) parser.
Stars: ✭ 568 (-11.8%)
Mutual labels:  parser
Webassemblyjs
Toolchain for WebAssembly
Stars: ✭ 566 (-12.11%)
Mutual labels:  parser
Msgpack Rust
MessagePack implementation for Rust / msgpack.org[Rust]
Stars: ✭ 561 (-12.89%)
Mutual labels:  serde
Guessit
GuessIt is a python library that extracts as much information as possible from a video filename.
Stars: ✭ 643 (-0.16%)
Mutual labels:  parser
Marshmallow
A lightweight library for converting complex objects to and from simple Python datatypes.
Stars: ✭ 5,857 (+809.47%)
Mutual labels:  serde
Readxl
Read excel files (.xls and .xlsx) into R 🖇
Stars: ✭ 585 (-9.16%)
Mutual labels:  excel
Self Attentive Parser
High-accuracy NLP parser with models for 11 languages.
Stars: ✭ 569 (-11.65%)
Mutual labels:  parser
Sqlparser Rs
Extensible SQL Lexer and Parser for Rust
Stars: ✭ 607 (-5.75%)
Mutual labels:  parser
Lol Html
Low output latency streaming HTML parser/rewriter with CSS selector-based API
Stars: ✭ 566 (-12.11%)
Mutual labels:  parser
Valveresourceformat
🔬 Valve's Source 2 resource file format parser and decompiler
Stars: ✭ 638 (-0.93%)
Mutual labels:  parser
Pimpmylog
🍭 Log viewer for your web server
Stars: ✭ 564 (-12.42%)
Mutual labels:  parser
Laracsv
A Laravel package to easily generate CSV files from Eloquent model
Stars: ✭ 583 (-9.47%)
Mutual labels:  excel
Dataproofer
A proofreader for your data
Stars: ✭ 628 (-2.48%)
Mutual labels:  excel
Tabula
Tabula is a tool for liberating data tables trapped inside PDF files
Stars: ✭ 5,420 (+741.61%)
Mutual labels:  excel
Remarkable
Markdown parser, done right. Commonmark support, extensions, syntax plugins, high speed - all in one. Gulp and metalsmith plugins available. Used by Facebook, Docusaurus and many others! Use https://github.com/breakdance/breakdance for HTML-to-markdown conversion. Use https://github.com/jonschlinkert/markdown-toc to generate a table of contents.
Stars: ✭ 5,252 (+715.53%)
Mutual labels:  parser

calamine

An Excel/OpenDocument Spreadsheets file reader/deserializer, in pure Rust.

Build Status Build status

Documentation

Description

calamine is a pure Rust library to read and deserialize any spreadsheet file:

  • excel like (xls, xlsx, xlsm, xlsb, xla, xlam)
  • opendocument spreadsheets (ods)

As long as your files are simple enough, this library should just work. For anything else, please file an issue with a failing test or send a pull request!

Examples

Serde deserialization

It is as simple as:

use calamine::{open_workbook, Error, Xlsx, Reader, RangeDeserializerBuilder};

fn example() -> Result<(), Error> {
    let path = format!("{}/tests/temperature.xlsx", env!("CARGO_MANIFEST_DIR"));
    let mut workbook: Xlsx<_> = open_workbook(path)?;
    let range = workbook.worksheet_range("Sheet1")
        .ok_or(Error::Msg("Cannot find 'Sheet1'"))??;

    let mut iter = RangeDeserializerBuilder::new().from_range(&range)?;

    if let Some(result) = iter.next() {
        let (label, value): (String, f64) = result?;
        assert_eq!(label, "celsius");
        assert_eq!(value, 22.2222);
        Ok(())
    } else {
        Err(From::from("expected at least one record but got none"))
    }
}

Note if you want to deserialise a column that may have invalid types (i.e. a float where some values may be strings), you can use Serde's deserialize_with field attribute:

use serde::{Deserialize, Serialize};
use calamine::{RangeDeserializerBuilder, Reader, Xlsx};


#[derive(Serialize, Deserialize, Debug)]
struct RawExcelRow {
    metric: String,
    #[serde(deserialize_with = "de_opt_f64")]
    value: Option<f64>,
}


// Convert value cell to Some(f64) if float or int, else None
fn de_opt_f64<'de, D>(deserializer: D) -> Result<Option<f64>, D::Error>
where
    D: serde::Deserializer<'de>,
{
    let data_type = calamine::DataType::deserialize(deserializer);
    match data_type {
        Ok(calamine::DataType::Error(_)) => Ok(None),
        Ok(calamine::DataType::Float(f)) => Ok(Some(f)),
        Ok(calamine::DataType::Int(i)) => Ok(Some(i as f64)),
        _ => Ok(None),
    }
}

fn main() ->  Result<(), Box<dyn std::error::Error>> {
    let path = format!("{}/tests/excel.xlsx", env!("CARGO_MANIFEST_DIR"));
    let mut excel: Xlsx<_> = open_workbook(path)?;

    let range = excel
      .worksheet_range("Sheet1")
      .ok_or(calamine::Error::Msg("Cannot find Sheet1"))??;

    let iter_result =
        RangeDeserializerBuilder::with_headers(&COLUMNS).from_range::<_, RawExcelRow>(&range)?;
  }

Reader: Simple

use calamine::{Reader, Xlsx, open_workbook};

let mut excel: Xlsx<_> = open_workbook("file.xlsx").unwrap();
if let Some(Ok(r)) = excel.worksheet_range("Sheet1") {
    for row in r.rows() {
        println!("row={:?}, row[0]={:?}", row, row[0]);
    }
}

Reader: More complex

Let's assume

  • the file type (xls, xlsx ...) cannot be known at static time
  • we need to get all data from the workbook
  • we need to parse the vba
  • we need to see the defined names
  • and the formula!
use calamine::{Reader, open_workbook_auto, Xlsx, DataType};

// opens a new workbook
let path = ...; // we do not know the file type
let mut workbook = open_workbook_auto(path).expect("Cannot open file");

// Read whole worksheet data and provide some statistics
if let Some(Ok(range)) = workbook.worksheet_range("Sheet1") {
    let total_cells = range.get_size().0 * range.get_size().1;
    let non_empty_cells: usize = range.used_cells().count();
    println!("Found {} cells in 'Sheet1', including {} non empty cells",
             total_cells, non_empty_cells);
    // alternatively, we can manually filter rows
    assert_eq!(non_empty_cells, range.rows()
        .flat_map(|r| r.iter().filter(|&c| c != &DataType::Empty)).count());
}

// Check if the workbook has a vba project
if let Some(Ok(mut vba)) = workbook.vba_project() {
    let vba = vba.to_mut();
    let module1 = vba.get_module("Module 1").unwrap();
    println!("Module 1 code:");
    println!("{}", module1);
    for r in vba.get_references() {
        if r.is_missing() {
            println!("Reference {} is broken or not accessible", r.name);
        }
    }
}

// You can also get defined names definition (string representation only)
for name in workbook.defined_names() {
    println!("name: {}, formula: {}", name.0, name.1);
}

// Now get all formula!
let sheets = workbook.sheet_names().to_owned();
for s in sheets {
    println!("found {} formula in '{}'",
             workbook
                .worksheet_formula(&s)
                .expect("sheet not found")
                .expect("error while getting formula")
                .rows().flat_map(|r| r.iter().filter(|f| !f.is_empty()))
                .count(),
             s);
}

Features

  • dates: Add date related fn to DataType.

Others

Browse the examples directory.

Performance

While there is no official benchmark yet, my first tests show a significant boost compared to official C# libraries:

  • Reading cell values: at least 3 times faster
  • Reading vba code: calamine does not read all sheets when opening your workbook, this is not fair

Unsupported

Many (most) part of the specifications are not implemented, the focus has been put on reading cell values and vba code.

The main unsupported items are:

  • no support for writing excel files, this is a read-only library
  • no support for reading extra contents, such as formatting, excel parameter, encrypted components etc ...
  • no support for reading VB for opendocuments
  • dates: dates detection is not supported and will return DataType::Float. You can use the dates feature to get friendly conversions fn.

Credits

Thanks to xlsx-js developers! This library is by far the simplest open source implementation I could find and helps making sense out of official documentation.

Thanks also to all the contributors!

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].