All Projects → olivettigroup → materials-synthesis-generative-models

olivettigroup / materials-synthesis-generative-models

Licence: MIT license
Public release of data and code for materials synthesis generation

Programming Languages

HTML
75241 projects
python
139335 projects - #7 most used programming language
Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to materials-synthesis-generative-models

robot-mind-meld
A little game powered by word vectors
Stars: ✭ 31 (-34.04%)
Mutual labels:  word-embeddings
gcWGAN
Guided Conditional Wasserstein GAN for De Novo Protein Design
Stars: ✭ 38 (-19.15%)
Mutual labels:  generative-model
QuestionClustering
Clasificador de preguntas escrito en python 3 que fue implementado en el siguiente vídeo: https://youtu.be/qnlW1m6lPoY
Stars: ✭ 15 (-68.09%)
Mutual labels:  word-embeddings
datastories-semeval2017-task6
Deep-learning model presented in "DataStories at SemEval-2017 Task 6: Siamese LSTM with Attention for Humorous Text Comparison".
Stars: ✭ 20 (-57.45%)
Mutual labels:  word-embeddings
GrabNet
GrabNet: A Generative model to generate realistic 3D hands grasping unseen objects (ECCV2020)
Stars: ✭ 146 (+210.64%)
Mutual labels:  generative-model
SiameseCBOW
Implementation of Siamese CBOW using keras whose backend is tensorflow.
Stars: ✭ 14 (-70.21%)
Mutual labels:  word-embeddings
JoSH
[KDD 2020] Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding
Stars: ✭ 55 (+17.02%)
Mutual labels:  word-embeddings
3DCSGNet
CSGNet for voxel based input
Stars: ✭ 34 (-27.66%)
Mutual labels:  generative-model
SIFRank
The code of our paper "SIFRank: A New Baseline for Unsupervised Keyphrase Extraction Based on Pre-trained Language Model"
Stars: ✭ 96 (+104.26%)
Mutual labels:  word-embeddings
word2vec-tsne
Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.
Stars: ✭ 59 (+25.53%)
Mutual labels:  word-embeddings
MidiTok
A convenient MIDI / symbolic music tokenizer for Deep Learning networks, with multiple strategies 🎶
Stars: ✭ 180 (+282.98%)
Mutual labels:  generative-model
py-msa-kdenlive
Python script to load a Kdenlive (OSS NLE video editor) project file, and conform the edit on video or numpy arrays.
Stars: ✭ 25 (-46.81%)
Mutual labels:  generative-model
tdmms
Two-dimensional materials manufacturing system
Stars: ✭ 17 (-63.83%)
Mutual labels:  materials-science
Generalization-Causality
关于domain generalization,domain adaptation,causality,robutness,prompt,optimization,generative model各式各样研究的阅读笔记
Stars: ✭ 482 (+925.53%)
Mutual labels:  generative-model
atomai
Deep and Machine Learning for Microscopy
Stars: ✭ 77 (+63.83%)
Mutual labels:  materials-science
style-vae
Implementation of VAE and Style-GAN Architecture Achieving State of the Art Reconstruction
Stars: ✭ 25 (-46.81%)
Mutual labels:  generative-model
data-resources-for-materials-science
A list of databases, datasets and books/handbooks where you can find materials properties for machine learning applications.
Stars: ✭ 81 (+72.34%)
Mutual labels:  materials-science
uf3
UF3: a python library for generating ultra-fast interatomic potentials
Stars: ✭ 19 (-59.57%)
Mutual labels:  materials-science
gans-in-action
"GAN 인 액션"(한빛미디어, 2020)의 코드 저장소입니다.
Stars: ✭ 29 (-38.3%)
Mutual labels:  generative-model
thermo pw
Thermo_pw is a driver of quantum-ESPRESSO routines for the automatic computation of ab-initio material properties.
Stars: ✭ 34 (-27.66%)
Mutual labels:  materials-science

Generative models and NLP resources for materials synthesis

Public release of data and code for materials synthesis generation, along with NLP resources for materials science. 🎉

This code and data is a companion to the paper, "Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks."

Demo 🐍

demo.ipynb (or demo.html) contains a Python demo showcasing the fine-tuned word embeddings introduced in this paper. The demo also provides an example of building and inspecting the autoencoder models.

Annotated NER Data 📝

data/ner_annotations.json contains tokenized and labelled NER information for 235 synthesis recipes. Each annotated recipe is marked by a "split" key which may be "train", "test", or "dev" - and there are also five papers (which were used for interannotator agreement internally) marked with a "metrics" split. These splits are merely suggested (and were indeed computed randomly), and so we encourage others to use whatever splits of the data they deem appropriate. This file should be usable as-is for training NER models. Each annotated document contains equal-length arrays of tokens and their respective labels.

data/brat/ contains raw annotation files in the BRAT annotation format. You can load these into your own instance of BRAT and modify the annotations however you like! These files contain event/relation annotations as well (e.g., "heat" acts on "titania").

NLP Resource Downloads 💽

Along with this work, we also open-source two pre-trained word embedding models: FastText and ELMo, each trained on our internal database of over 2.5 million materials science articles and our pre-trained model to classify the paragraphs of material science articles.

The FastText model follows the gensim Python library, and can be loaded as a keyedvectors object. Please see the gensim documentation for more details. Note that our version FastText is trained on lowercase text only.

The ELMo model follows the weights/options layout in the allenai/bilm-tf public GitHub repository. You can load the embeddings as described in their README (or just use the code in this repo, at models/token_classifier.py), but simply swap out the weight and options files. We found that using the default vocab.txt works fine, so there's no need to swap anything out in that case. As per the recommendations of the ELMo authors, we don't perform lowercase normalization for ELMo, so you can compute word vectors for text "as-is."

Links to the trained models/weights are as follows:

Neural Network Models/Data 🧠

models/action_generator.py contains the architecture for the CVAE (synthesis action generation).

models/material_generator.py contains the architecture for the CVAE (precursor generation).

model/token_classifier.py contains the architecture for the NER model. The methods used for loading in a pretrained ELMo model (via Tensorflow) are also provided here.

model/paragraph_classifier.py contains the architecture and code used for the paragraph classifier model.

data/unsynth_recipes_w_citations.json collects the suggested recipes produced by the CVAE model for screening unsynthesized ABO3-perovskite compounds. The document also contains CVAE-suggested nearest-neighbor literature.

Citing 📚

If you use this work (e.g., the NER model, the generative models, the pre-trained embeddings), please cite the following work(s) as appropriate:

Kim, E., Jensen, Z., Grootel, A.V., Huang, K., Staib, M., Mysore, S., Chang, H.S., Strubell, E., McCallum, A., Jegelka, S. and Olivetti, E., 2020. Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks. Journal of Chemical Information and Modeling.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].