All Projects → J-SNACKKB → FLIP

J-SNACKKB / FLIP

Licence: AFL-3.0 license
A collection of tasks to probe the effectiveness of protein sequence representations in modeling aspects of protein design

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to FLIP

deepblast
Neural Networks for Protein Sequence Alignment
Stars: ✭ 29 (-17.14%)
Mutual labels:  protein, protein-sequences
gcWGAN
Guided Conditional Wasserstein GAN for De Novo Protein Design
Stars: ✭ 38 (+8.57%)
Mutual labels:  protein, protein-design
lightdock
Protein-protein, protein-peptide and protein-DNA docking framework based on the GSO algorithm
Stars: ✭ 110 (+214.29%)
Mutual labels:  protein, protein-design
tape-neurips2019
Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. (DEPRECATED)
Stars: ✭ 117 (+234.29%)
Mutual labels:  protein-sequences
RamaNet
Preforms De novo protein design using machine learning and PyRosetta to generate a novel protein structure
Stars: ✭ 41 (+17.14%)
Mutual labels:  protein-design
reprieve
A library for evaluating representations.
Stars: ✭ 68 (+94.29%)
Mutual labels:  representation-learning
pair2vec
pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
Stars: ✭ 62 (+77.14%)
Mutual labels:  representation-learning
PCC-pytorch
A pytorch implementation of the paper "Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control"
Stars: ✭ 57 (+62.86%)
Mutual labels:  representation-learning
ParametricUMAP paper
Parametric UMAP embeddings for representation and semisupervised learning. From the paper "Parametric UMAP: learning embeddings with deep neural networks for representation and semi-supervised learning" (Sainburg, McInnes, Gentner, 2020).
Stars: ✭ 132 (+277.14%)
Mutual labels:  representation-learning
REGAL
Representation learning-based graph alignment based on implicit matrix factorization and structural embeddings
Stars: ✭ 78 (+122.86%)
Mutual labels:  representation-learning
MTL-AQA
What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]
Stars: ✭ 38 (+8.57%)
Mutual labels:  representation-learning
cgdms
Differentiable molecular simulation of proteins with a coarse-grained potential
Stars: ✭ 44 (+25.71%)
Mutual labels:  protein
GLOM-TensorFlow
An attempt at the implementation of GLOM, Geoffrey Hinton's paper for emergent part-whole hierarchies from data
Stars: ✭ 32 (-8.57%)
Mutual labels:  representation-learning
Learning-From-Rules
Implementation of experiments in paper "Learning from Rules Generalizing Labeled Exemplars" to appear in ICLR2020 (https://openreview.net/forum?id=SkeuexBtDr)
Stars: ✭ 46 (+31.43%)
Mutual labels:  representation-learning
pia
📚 🔬 PIA - Protein Inference Algorithms
Stars: ✭ 19 (-45.71%)
Mutual labels:  protein
FUSION
PyTorch code for NeurIPSW 2020 paper (4th Workshop on Meta-Learning) "Few-Shot Unsupervised Continual Learning through Meta-Examples"
Stars: ✭ 18 (-48.57%)
Mutual labels:  representation-learning
TCE
This repository contains the code implementation used in the paper Temporally Coherent Embeddings for Self-Supervised Video Representation Learning (TCE).
Stars: ✭ 51 (+45.71%)
Mutual labels:  representation-learning
causal-ml
Must-read papers and resources related to causal inference and machine (deep) learning
Stars: ✭ 387 (+1005.71%)
Mutual labels:  representation-learning
EVE
Official repository for the paper "Large-scale clinical interpretation of genetic variants using evolutionary data and deep learning". Joint collaboration between the Marks lab and the OATML group.
Stars: ✭ 37 (+5.71%)
Mutual labels:  protein
M-NMF
An implementation of "Community Preserving Network Embedding" (AAAI 2017)
Stars: ✭ 119 (+240%)
Mutual labels:  representation-learning

Bio-Benchmarks for Protein Engineering

This repository is for the paper submitted to the 2021 NeurIPS Benchmark track.

Folder breakup

  1. collect_splits contains notebooks to process RAW datasets collected from various sources.
  2. splits contains all splits, a brief description of their processing and the logic behind train/test splits
  3. baselines contains code used to compute baselines

A .gitignored folder called data contains RAW data used to produce all splits. As the folder size is substantial, it could not be shipped with GitHub. However, it can be accessed here: http://data.bioembeddings.com/public/FLIP

Find out more about the splits

The goal of the splits in this repository is to assess how well machine learning devices using protein sequence inputs can represent different dimensions relevant for protein design. The main place to find out about the splits is the splits folder. Each set contains a zip file with one or more "splits", where different splits may be different train/test splits based on biological or statistical intuition.

Split semaphore

Splits are associated with a semaphore which indicates for what they may be used:

  • 🟢: active splits can be used to evaluate accuracy of your machine learning models
  • 🟠: splits that should not be used to make performance comparisons, as may give overestimations, or because other active splits have similar discriminative ability
  • 🔴: splits that should not be used / considered obsolete. Please do not use these to report performance.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].