All Projects → kermitt2 → grobid-quantities

kermitt2 / grobid-quantities

Licence: Apache-2.0 License
GROBID extension for identifying and normalizing physical quantities.

Programming Languages

javascript
184084 projects - #8 most used programming language
java
68154 projects - #9 most used programming language
CSS
56736 projects
HTML
75241 projects
XSLT
1337 projects
Dockerfile
14818 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to grobid-quantities

fastai sequence tagging
sequence tagging for NER for ULMFiT
Stars: ✭ 21 (-60.38%)
Mutual labels:  crf
keras-crf-layer
Implementation of CRF layer in Keras.
Stars: ✭ 76 (+43.4%)
Mutual labels:  crf
crfs-rs
Pure Rust port of CRFsuite: a fast implementation of Conditional Random Fields (CRFs)
Stars: ✭ 22 (-58.49%)
Mutual labels:  crf
BiLSTM-CRF-NER-PyTorch
This repo contains a PyTorch implementation of a BiLSTM-CRF model for named entity recognition task.
Stars: ✭ 109 (+105.66%)
Mutual labels:  crf
Hierarchical-Word-Sense-Disambiguation-using-WordNet-Senses
Word Sense Disambiguation using Word Specific models, All word models and Hierarchical models in Tensorflow
Stars: ✭ 33 (-37.74%)
Mutual labels:  crf
crf4j
a complete Java port of crfpp(crf++)
Stars: ✭ 30 (-43.4%)
Mutual labels:  crf
crfsuite-rs
Rust binding to crfsuite
Stars: ✭ 19 (-64.15%)
Mutual labels:  crf
giantgo-render
基于vue3 element plus的快速表单生成器
Stars: ✭ 28 (-47.17%)
Mutual labels:  crf
CRFasRNNLayer
Conditional Random Fields as Recurrent Neural Networks (Tensorflow)
Stars: ✭ 76 (+43.4%)
Mutual labels:  crf
Legal-Entity-Recognition
A Dataset of German Legal Documents for Named Entity Recognition
Stars: ✭ 98 (+84.91%)
Mutual labels:  crf
deepseg
Chinese word segmentation in tensorflow 2.x
Stars: ✭ 23 (-56.6%)
Mutual labels:  crf
korean ner tagging challenge
KU_NERDY 이동엽, 임희석 (2017 국어 정보 처리 시스템경진대회 금상) - 한글 및 한국어 정보처리 학술대회
Stars: ✭ 30 (-43.4%)
Mutual labels:  crf
jcrfsuite
Java interface for CRFsuite: http://www.chokkan.org/software/crfsuite/
Stars: ✭ 44 (-16.98%)
Mutual labels:  crf
Gumbel-CRF
Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs
Stars: ✭ 51 (-3.77%)
Mutual labels:  crf
StatNLP-Framework
C++ based implementation of StatNLP framework
Stars: ✭ 17 (-67.92%)
Mutual labels:  crf
mahjong
开源中文分词工具包,中文分词Web API,Lucene中文分词,中英文混合分词
Stars: ✭ 40 (-24.53%)
Mutual labels:  crf
crf-seg
crf-seg:用于生产环境的中文分词处理工具,可自定义语料、可自定义模型、架构清晰,分词效果好。java编写。
Stars: ✭ 13 (-75.47%)
Mutual labels:  crf
lstm-crf-tagging
No description or website provided.
Stars: ✭ 13 (-75.47%)
Mutual labels:  crf
Computer-Vision
implemented some computer vision problems
Stars: ✭ 25 (-52.83%)
Mutual labels:  crf
CIP
Basic exercises of chinese information processing
Stars: ✭ 32 (-39.62%)
Mutual labels:  crf

grobid-quantities

License Documentation Status CircleCI Coverage Status Docker Hub

Work in progress.

The goal of this GROBID module is to recognize in textual documents any expressions of measurements (e.g. pressure, temperature, etc.), to parse and normalization them, and finally to convert these measurements into SI units. We focus our work on technical and scientific articles (text, XML and PDF input) and patents (text and XML input).

GROBID Quantity Demo

As part of this task we support the recognition of the different value representation: numerical, alphabetical, exponential and date/time expressions.

Grobid Quantity Demo

Finally we support the identification of the "quantified" substance related to the measure, e.g. silicon nitride powder in

GROBID Quantity Demo

As the other GROBID models, the module relies only on machine learning and it uses linear CRF. The normalisation of quantities is handled by the java library Units of measurement.

Latest version

The latest released version of grobid-quantities is 0.7.0. The current development version is 0.7.1-SNAPSHOT.

Update from 0.6.0 to 0.7.0

In version 0.7.0 the models have been updated, therefore is required to run a ./gradlew copyModels to have properly results especially for what concern the unit normalisation.

Documentation

You can find the latest documentation here.

Evaluation

The results (Precision, Recall, F-score) for all the models have been obtained using 10-fold cross-validation (average metrics over the 10 folds).

BidLSTM + CRF

Evaluated on the 28/11/2021 (using layout features, architecture BidLSTM_CRF_FEATURES)

Quantities

Labels Precision Recall F1-Score
<unitLeft> 95.17 96.67 95.91
<unitRight> 92.52 83.64 87.69
<valueAtomic> 81.74 89.21 85.30
<valueBase> 100.00 75.00 85.71
<valueLeast> 89.24 82.25 85.55
<valueList> 75.27 75.33 75.12
<valueMost> 89.02 81.56 85.10
<valueRange> 100.00 96.25 97.90
all (micro avg.) 87.23 89.00 88.10

Units

Labels Precision Recall F1-Score
<base> 98.26 98.52 98.39
<pow> 100.00 98.57 99.28
<prefix> 98.89 97.75 98.30
all (micro avg.) 98.51 98.39 98.45

Values

Labels Precision Recall F1-Score
<alpha> 99.41 99.55 99.48
<base> 96.67 100.00 98.00
<number> 99.55 98.68 99.11
<pow> 72.50 75.00 73.50
<time> 80.84 100.00 89.28
all (micro avg.) 98.49 98.66 98.57

CRF

Evaluated on the 30/04/2020.

Quantities

Labels Precision Recall F1-Score
<unitLeft> 96.45 95.06 95.74
<unitRight> 88.96 68.65 75.43
<valueAtomic> 85.75 85.35 85.49
<valueBase> 73.06 66.43 68.92
<valueLeast> 85.68 79.03 82.07
<valueList> 68.38 53.31 58.94
<valueRange> 90.25 88.58 88.86
all (micro avg.) 88.96 85.4 87.14

Units

Updated the 10/02/2021

Labels Precision Recall F1-Score
<base> 98.82 99.14 98.98
<pow> 97.62 98.56 98.08
<prefix> 99.5 98.76 99.13
all (micro avg.) 98.85 99.01 98.93

Values

Labels Precision Recall F1-Score
<alpha> 96.9 98.84 97.85
<base> 85.14 74.48 79
<number> 98.07 99.05 98.55
<pow> 80.05 76.33 77.54
<time> 73.07 86.82 79.26
all (micro avg.) 96.15 97.95 97.4

The current average results have been calculated using micro average which provides more realistic results by giving different weights to labels based on their frequency. The paper "Automatic Identification and Normalisation of Physical Measurements in Scientific Literature", published in September 2019 reported average evaluation based on macro average.

Acknowledgement

This project has been created and developed by science-miner since 2015, with additional support by Inria, in Paris (France) and the National Institute for Materials Science, in Tsukuba (Japan).

How to cite

If you want to cite this work, please simply refer to the github project with optionally the Software Heritage project-level permanent identifier:

grobid-quantities (2015-2021) <https://github.com/kermitt2/grobid-quantities>, swh:1:dir:dbf9ee55889563779a09b16f9c451165ba62b6d7

Here's a BibTeX entry using the Software Heritage project-level permanent identifier:

@misc{grobid-quantities,
    title = {grobid-quantities},
    howpublished = {\url{https://github.com/kermitt2/grobid-quantities}},
    publisher = {GitHub},
    year = {2015--2021},
    archivePrefix = {swh},
    eprint = {1:dir:dbf9ee55889563779a09b16f9c451165ba62b6d7}
}

License

GROBID and grobid-quantities are distributed under Apache 2.0 license.

The documentation is distributed under CC-0 license and the annotated data under CC-BY license.

If you contribute to grobid-quantities, you agree to share your contribution following these licenses.

Contact: Patrice Lopez ([email protected]), Luca Foppiano ([email protected])

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].