All Projects → ShuHuang → batterydatabase

ShuHuang / batterydatabase

Licence: MIT license
Tools for auto-generating the battery-materials database.

Programming Languages

HTML
75241 projects
python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects
shell
77523 projects

Projects that are alternatives of or similar to batterydatabase

Open Ie Papers
Open Information Extraction (OpenIE) and Open Relation Extraction (ORE) papers and data.
Stars: ✭ 150 (+417.24%)
Mutual labels:  information-extraction
Mitie
MITIE: library and tools for information extraction
Stars: ✭ 2,693 (+9186.21%)
Mutual labels:  information-extraction
awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
Stars: ✭ 620 (+2037.93%)
Mutual labels:  information-extraction
Sypht Python Client
A python client for the Sypht API
Stars: ✭ 160 (+451.72%)
Mutual labels:  information-extraction
Yargy
Rule-based facts extraction for Russian language
Stars: ✭ 216 (+644.83%)
Mutual labels:  information-extraction
lima
The Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
Stars: ✭ 75 (+158.62%)
Mutual labels:  information-extraction
Nl2sql
阿里天池首届中文NL2SQL挑战赛top6
Stars: ✭ 146 (+403.45%)
Mutual labels:  information-extraction
TableDisentangler
Functional and structural analysis of tables in research papers (Table disentangling)
Stars: ✭ 21 (-27.59%)
Mutual labels:  information-extraction
Holmes Extractor
Information extraction from English and German texts based on predicate logic
Stars: ✭ 233 (+703.45%)
Mutual labels:  information-extraction
neural name tagging
Code for "Reliability-aware Dynamic Feature Composition for Name Tagging" (ACL2019)
Stars: ✭ 39 (+34.48%)
Mutual labels:  information-extraction
Nel
Entity linking framework
Stars: ✭ 176 (+506.9%)
Mutual labels:  information-extraction
Ail Framework
AIL framework - Analysis Information Leak framework
Stars: ✭ 191 (+558.62%)
Mutual labels:  information-extraction
palladian
Palladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
Stars: ✭ 32 (+10.34%)
Mutual labels:  information-extraction
Chemdataextractor
Automatically extract chemical information from scientific documents
Stars: ✭ 152 (+424.14%)
Mutual labels:  information-extraction
naacl2018-fever
Fact Extraction and VERification baseline published in NAACL2018
Stars: ✭ 109 (+275.86%)
Mutual labels:  information-extraction
Information Extraction Chinese
Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取
Stars: ✭ 1,888 (+6410.34%)
Mutual labels:  information-extraction
Ner Bert Pytorch
PyTorch solution of named entity recognition task Using Google AI's pre-trained BERT model.
Stars: ✭ 249 (+758.62%)
Mutual labels:  information-extraction
Quijote-simulations
Large suite of N-body simulations
Stars: ✭ 69 (+137.93%)
Mutual labels:  information-extraction
DocuNet
Code and dataset for the IJCAI 2021 paper "Document-level Relation Extraction as Semantic Segmentation".
Stars: ✭ 84 (+189.66%)
Mutual labels:  information-extraction
ReQuest
Indirect Supervision for Relation Extraction Using Question-Answer Pairs (WSDM'18)
Stars: ✭ 26 (-10.34%)
Mutual labels:  information-extraction

batterydatabase

License

Tools for auto-generating the battery materials database.

Installation

Please first install the public ChemDataExtractor version (v1.3):

conda install -c chemdataextractor chemdataextractor

Download the necessary data files (machine learning models, dictionaries, etc.):

cde data download

Then install the dependency packages for the bespoke version for batteries (chemdataextractor_batteries v1.5):

pip install -r requirements.txt

Usage

To extract raw data from text, you need to provide the root of the paper folder, output root to data record folder, start and end index of papers, and the file name to be saved.

For example, extract the first paper of test/ and save to save/ as raw_data.json:

python extract.py --input_dir test/ --output_dir save/ --start 0 --end 1 --save_name raw_data

After the raw data is extracted, it needs to be cleaned and converted into a standard format. We provide the data cleaning code in dataclean.ipynb. The final data format can be .json, .csv or .db.

Acknowledgements

This project was financially supported by the Science and Technology Facilities Council (STFC), the Royal Academy of Engineering (RCSRF1819\7\10) and Christ's College, Cambridge. The Argonne Leadership Computing Facility, which is a DOE Office of Science Facility, is also acknowledged for use of its research resources, under contract No. DEAC02-06CH11357.

Citation

@article{huang2020database,
  title={A database of battery materials auto-generated using ChemDataExtractor},
  author={Huang, Shu and Cole, Jacqueline M},
  journal={Scientific Data},
  volume={7},
  number={1},
  pages={1--13},
  year={2020},
  publisher={Nature Publishing Group}
}

DOI

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].