Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → seyonechithrananda → Bert Loves Chemistry

seyonechithrananda / Bert Loves Chemistry

Licence: mit

bert-loves-chemistry: a repository of HuggingFace models applied on chemical SMILES data for drug design, chemical modelling, etc.

Labels

jupyter-notebook

Projects that are alternatives of or similar to Bert Loves Chemistry

Solution Accelerator Many Models

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Ossdc Visionbasedacc

Discuss requirments and develop code for #1-mvp-vbacc MVP (see also this channel on ossdc.org Slack)

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Satellite data processing experiments

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Ec2 Spot Workshops

Collection of workshops to demonstrate best practices in using Amazon EC2 Spot Instances. https://aws.amazon.com/ec2/spot/

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Practical Ml W Python

Source code for 'Practical Machine Learning with Python' by Dipanjan Sarkar, Raghav Bali, and Tushar Sharma

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Sharing isl python

An Introduction to Statistical Learning with Applications in PYTHON

Stars: ✭ 105 (+1.94%)

Mutual labels: jupyter-notebook

Essential and Fundametal aspects of Natural Language Processing with hands-on examples and case-studies

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Tensorflow2.0 Examples

🙄 Difficult algorithm, Simple code.

Stars: ✭ 1,397 (+1256.31%)

Mutual labels: jupyter-notebook

Yet another black-box optimization library for Python

Stars: ✭ 103 (+0%)

Mutual labels: jupyter-notebook

Python Fundamentals

Introductory Python Series for UC Berkeley's D-Lab

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Docker file for building Gen and Jupyter notebooks for tutorials and case studies

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Pose Interpreter Networks

Real-Time Object Pose Estimation with Pose Interpreter Networks (IROS 2018)

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Circle Line Analytics

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Intro To Deep Learning

A collection of materials to help you learn about deep learning

Stars: ✭ 103 (+0%)

Mutual labels: jupyter-notebook

Keras Hello World

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

Face Id With Medical Masks

Face ID recognition with medical masks

Stars: ✭ 103 (+0%)

Mutual labels: jupyter-notebook

Deep Markov Models

Stars: ✭ 103 (+0%)

Mutual labels: jupyter-notebook

Explore and download data from Census APIs

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

主要展示Datawhale的组队学习计划。

Stars: ✭ 1,397 (+1256.31%)

Mutual labels: jupyter-notebook

Partia Computing Michaelmas

Activities and exercises for the Part IA computing course in Michaelmas Term

Stars: ✭ 104 (+0.97%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

ChemBERTa

ChemBERTa: A collection of BERT-like models applied to chemical SMILES data for drug design, chemical modelling, and property prediction. To be presented at Baylearn and the Royal Society of Chemistry's Chemical Science Symposium.

Tutorial
Arxiv Paper
Poster
Abstract
BibTex

License: MIT License

Right now the notebooks are all for the RoBERTa model (a variant of BERT) trained on the task of masked-language modelling (MLM). Training was done over 10 epochs until loss converged to around 0.26 on the ZINC 250k dataset. The model weights for ChemBERTA pre-trained on various datasets (ZINC 100k, ZINC 250k, PubChem 100k, PubChem 250k, PubChem 1M, PubChem 10M) are available using HuggingFace. We expect to continue to release larger models pre-trained on even larger subsets of ZINC, CHEMBL, and PubChem in the near future.

This library is currently primarily a set of notebooks with our pre-training and fine-tuning setup, and will be updated soon with model implementation + attention visualization code, likely after the Arxiv publication. Stay tuned!

I hope this is of use to developers, students and researchers exploring the use of transformers and the attention mechanism for chemistry!

Citing Our Work

Please cite ChemBERTa's ArXiv paper if you have used these models, notebooks, or examples in any way. The link to the BibTex is available here.

Example

You can load the tokenizer + model for MLM prediction tasks using the following code:

from transformers import AutoModelWithLMHead, AutoTokenizer, pipeline

#any model weights from the link above will work here
model = AutoModelWithLMHead.from_pretrained("seyonec/ChemBERTa-zinc-base-v1")
tokenizer = AutoTokenizer.from_pretrained("seyonec/ChemBERTa-zinc-base-v1")

fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)

The abstract for this method is detailed here. We expect to release a full paper on Arxiv in end-August.

Todo:

[ ] Official DeepChem implementation of ChemBERTa using model API (In progress)
[X] Open-source attention visualization suite used in paper (After formal publication - Beginning of September).
[x] Release larger pre-trained models, and support for a wider array of property prediction tasks (BBBP, etc). - See HuggingFace
[x] Finish writing notebook to train model
[x] Finish notebook to preload and run predictions on a single molecule —> test if HuggingFace works
[x] Train RoBERTa model until convergence
[x] Upload weights onto HuggingFace
[x] Create tutorial using evaluation + fine-tuning notebook.
[x] Create documentation + writing, visualizations for notebook.
[x] Setup PR into DeepChem

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 103

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗