All Projects → CederGroupHub → text-mined-synthesis_public

CederGroupHub / text-mined-synthesis_public

Licence: other
Codes for text-mined solid-state reactions dataset

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to text-mined-synthesis public

Qminer
Analytic platform for real-time large-scale streams containing structured and unstructured data.
Stars: ✭ 206 (+347.83%)
Mutual labels:  text-mining
koshort
(deprecated) 🐱 koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.
Stars: ✭ 62 (+34.78%)
Mutual labels:  text-mining
DMFTwDFT
DMFTwDFT: An open-source code combining Dynamical Mean Field Theory with various Density Functional Theory packages
Stars: ✭ 39 (-15.22%)
Mutual labels:  materials-science
Gwu data mining
Materials for GWU DNSC 6279 and DNSC 6290.
Stars: ✭ 217 (+371.74%)
Mutual labels:  text-mining
clustext
Easy, fast clustering of texts
Stars: ✭ 18 (-60.87%)
Mutual labels:  text-mining
readability
Fast readability scores for text data
Stars: ✭ 22 (-52.17%)
Mutual labels:  text-mining
Fake news detection
Fake News Detection in Python
Stars: ✭ 194 (+321.74%)
Mutual labels:  text-mining
palladian
Palladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
Stars: ✭ 32 (-30.43%)
Mutual labels:  text-mining
text-analysis
Weaving analytical stories from text data
Stars: ✭ 12 (-73.91%)
Mutual labels:  text-mining
MolDQN-pytorch
A PyTorch Implementation of "Optimization of Molecules via Deep Reinforcement Learning".
Stars: ✭ 58 (+26.09%)
Mutual labels:  materials-science
Aravec
AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models.
Stars: ✭ 239 (+419.57%)
Mutual labels:  text-mining
OpenMaterial
3D model exchange format with physical material properties for virtual development, test and validation of automated driving.
Stars: ✭ 23 (-50%)
Mutual labels:  materials-science
ESPEI
Fitting thermodynamic models with pycalphad - https://doi.org/10.1557/mrc.2019.59
Stars: ✭ 46 (+0%)
Mutual labels:  materials-science
Cnn Text Classification Keras
Text Classification by Convolutional Neural Network in Keras
Stars: ✭ 213 (+363.04%)
Mutual labels:  text-mining
Answerable
Recommendation system for Stack Overflow unanswered questions
Stars: ✭ 13 (-71.74%)
Mutual labels:  text-mining
Shallowlearn
An experiment about re-implementing supervised learning models based on shallow neural network approaches (e.g. fastText) with some additional exclusive features and nice API. Written in Python and fully compatible with Scikit-learn.
Stars: ✭ 196 (+326.09%)
Mutual labels:  text-mining
mathinmse.github.io
Applied Matematical Methods in Materials Engineering
Stars: ✭ 24 (-47.83%)
Mutual labels:  materials-science
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (+30.43%)
Mutual labels:  text-mining
Materials-Design-Ontology
An Ontology for the Materials Design Domain
Stars: ✭ 21 (-54.35%)
Mutual labels:  materials-science
masci-tools
Tools, utility, parsers useful in daily material science work
Stars: ✭ 18 (-60.87%)
Mutual labels:  materials-science

Text-mined Synthesis

In our project on text-mining data from literature, we have build up a large dataset of solid-state reactions. Here, we provide our auto-generated open-source dataset of 30,031 chemical reactions retrieved from 95,283 solid-state synthesis paragraphs: text-mined dataset. The data are collected using an automated extraction pipeline (see below) which converts unstructured scientific paragraphs describing inorganic materials synthesis into so-called “codified recipe” of synthesis. The pipeline utilizes a variety of text mining and NLP approaches to find information about target materials, starting compounds, synthesis steps and conditions in the text, and to process them into chemical equation.

Intro

This repo contains necessary codes and modules built to create the solid-state reactions dataset. If you find the codes and data useful, please cite our papers:

Dataset:

  • Kononova, O., Huo, H., He, T., Rong Z., Botari, T., Sun, W., Tshitoyan, V. and Ceder, G., 2019. Text-mined dataset of inorganic materials synthesis recipes. Scientific Data 6: 203.

Paragraphs classification:

  • Huo, H., Rong, Z., Kononova, O., Sun, W., Botari, T., He, T., Tshitoyan, V. and Ceder, G., 2019. Semi-supervised machine-learning classification of materials synthesis procedures. npj Computational Materials, 5(1), p.62.

Materials Entity Recognition (MER):

  • He, T., Sun, W., Huo, H., Kononova, O., Rong, Z., Tshitoyan, V., Botari, T. and Ceder, G., 2020. Similarity of Precursors in Solid-State Synthesis as Text-Mined from Scientific Literature. Chemistry of Materials, 32(18), pp.7861-7873.

Versions

  • [2020-07-13] Updated dataset 31782 solid state reactions and 9518 sol-gel precursor synthesis reactions. Updated data schema is dataset_typing.py.

Getting help

If you have questions about the project, please submit a issue or contact us ([email protected]). Thanks!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].