All Projects → 20n → act

20n / act

Licence: GPL-3.0 License
Computational synthetic biology: Predicting DNA edits for bioengineering

Programming Languages

java
68154 projects - #9 most used programming language
scala
5932 projects
python
139335 projects - #7 most used programming language
r
7636 projects
shell
77523 projects
FreeMarker
481 projects

Projects that are alternatives of or similar to act

poly
A Go package for engineering organisms.
Stars: ✭ 270 (+302.99%)
Mutual labels:  synthetic-biology, bioengineering
imgur-scraper
Retrieve years of imgur.com's data without any authentication.
Stars: ✭ 26 (-61.19%)
Mutual labels:  data-mining
Network-Intrusion-Detection-Using-Machine-Learning-Techniques
Network intrusions classification using algorithms such as Support Vector Machine (SVM), Decision Tree, Naive Baye, K-Nearest Neighbor (KNN), Logistic Regression and Random Forest.
Stars: ✭ 56 (-16.42%)
Mutual labels:  data-mining
advanced-text-mining
TEANAPS 라이브러리를 활용한 자연어 처리와 텍스트 분석 방법론에 대해 다룹니다.
Stars: ✭ 15 (-77.61%)
Mutual labels:  data-mining
FSCNMF
An implementation of "Fusing Structure and Content via Non-negative Matrix Factorization for Embedding Information Networks".
Stars: ✭ 16 (-76.12%)
Mutual labels:  data-mining
anomalyDetection
An R package for implementing augmented network log anomaly detection procedures
Stars: ✭ 21 (-68.66%)
Mutual labels:  data-mining
Tencent2017 Final Rank28 code
2017第一届腾讯社交广告高校算法大赛Rank28_code
Stars: ✭ 85 (+26.87%)
Mutual labels:  data-mining
SHAP FOLD
(Explainable AI) - Learning Non-Monotonic Logic Programs From Statistical Models Using High-Utility Itemset Mining
Stars: ✭ 35 (-47.76%)
Mutual labels:  data-mining
data-mining-course
An undergraduate course on data mining.
Stars: ✭ 24 (-64.18%)
Mutual labels:  data-mining
Kaggle-project-list
Summary of my projects on kaggle
Stars: ✭ 20 (-70.15%)
Mutual labels:  data-mining
NIDS-Intrusion-Detection
Simple Implementation of Network Intrusion Detection System. KddCup'99 Data set is used for this project. kdd_cup_10_percent is used for training test. correct set is used for test. PCA is used for dimension reduction. SVM and KNN supervised algorithms are the classification algorithms of project. Accuracy : %83.5 For SVM , %80 For KNN
Stars: ✭ 45 (-32.84%)
Mutual labels:  data-mining
genie
Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)
Stars: ✭ 21 (-68.66%)
Mutual labels:  data-mining
popular restaurants from officials
서울시 공무원의 업무추진비를 분석하여 진짜 맛집 찾기 프로젝트
Stars: ✭ 22 (-67.16%)
Mutual labels:  data-mining
SparseLSH
A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.
Stars: ✭ 127 (+89.55%)
Mutual labels:  data-mining
cocoon-demo
Cocoon – a flow-based workflow automation, data mining and visual analytics tool.
Stars: ✭ 19 (-71.64%)
Mutual labels:  data-mining
Bankruptcy-Prediction
Mining the Polish Bankruptcy Data
Stars: ✭ 21 (-68.66%)
Mutual labels:  data-mining
evine
Interactive CLI Web Crawler
Stars: ✭ 140 (+108.96%)
Mutual labels:  data-mining
jds
Jenesis Data Store: a dynamic, cross platform, high performance, ORM data-mapper. Designed to assist in rapid development and data mining
Stars: ✭ 17 (-74.63%)
Mutual labels:  data-mining
crazydoc
Read DNA sequences from colourful Microsoft Word documents
Stars: ✭ 18 (-73.13%)
Mutual labels:  synthetic-biology
seqviz
DNA sequence viewer supporting custom, GenBank, FASTA, NCBI accession, and iGEM input.
Stars: ✭ 99 (+47.76%)
Mutual labels:  synthetic-biology

20n/act: An open source platform for bioengineering

20n/act is the data aggregation and prediction system for bioengineering. For a target molecule, 20n/act predicts DNA insertions into cells (usually a microbe such as E. coli or S. cerevisiae) that modify the cell. These modified cells make the target molecule by fermentation from sugar. We call these "target molecules/chemicals" the bioreachables. The system predicted/invented the first bio-route to Acetaminophen/Tylenol/APAP. Read more on our blog post. The technical details of the APAP work can be found in patents applications on coli and yeast fermentation.

Getting started

Live preview

See predicted DNA for 11 sample molecules at Bioreachables Preview (Login:Pass = public:preview). Due to limitations we can only make a preview version available. If you'd like the full version please contact us.

Building the project

Checkout the repo. Follow instructions to run to create the database and prediction corpus. If you'd rather get a pre-packaged DB without creating it yourself please contact us. The codebase is public to further the state-of-the-art in automating biological engineering/synthetic biology. Some modules are specific to microbes, but most of the predictive stack deals with host-agnostic enzymatic biochemistry.

Components of 20n/act

Predictor stack

Answers "what DNA do I insert if I want to make my chemical?"

Module Function Code
1 Installer Integrates heterogeneous raw data Code:com.act.reachables.initdb
Run:Instructions
2 Reaction operator (RO) inference Mines rules of enzymatic catalysis Code:biointerpretation module
2 Structure Activity Relationship (SAR) inference Mines substrate specificities Code:biointerpretation module
3 Biointerpretation Mechanistic validation of enzymatic transforms (using ROs) Code:com.act.biointerpretation.BiointerpretationDriver
Run:Instructions
4 Reachables computation Exhaustively enumerates all biosynthesizable chemicals Code:com.act.reachables.reachables
Code:com.act.reachables.postprocess_reachables
Run:Instructions
5 Cascades computation Exhaustively enumerates all enzymatic routes from metabolic natives to bioreachable target Code:com.act.reachables.cascades
Run:Instructions
6 DNA designer Computes protein & DNA design (coli specific) for each non-natural enzymatic path Code:org.twentyn.proteintodna.ProteinToDNADriver
Run:Instructions
7 Application miner Mines chemical applications using web searches [Bing] Code:act.installer.bing.BingSearcher
Run:Instructions
8 Enzymatic biochemistry NLP Text -> Chemical tokens -> Biologically feasible reactions using ROs Code:act.shared.TextToRxns
Frontend:TextToRxnsUI
9 Patent search Chemical -> Patents Code:act.installer.reachablesexplorer.PatentFinder
Run:Instructions
10 Bioreachables wiki Aggregates reachables, cascades, use cases, protein and DNA designs into a user friendly wiki interface Documentation

Analytics

Answers "Is my bio-engineered cell doing what I want it to?"

Module Function Code
1 LCMS: Untargeted metabolomics Deep-learnt signal processing to identify all chemical [side]effects of DNA engineering on cell Code:DeepLearningLcmsPeak
Code:com.act.lcms.UntargetedMetabolomics
2 LCMS: Comparative visualization Visualizing traces side-by-side from untargeted evaluation of over and underexpressed peaks Doc:LCMSDataVisualisation

Unit economics of bioproduction

Answers "Can I use bio-production to make this chemical at scale?"

Module Function Code
1 Cost model: Manufacturing unit economics for large scale production It backcalculates cell efficiency (yield, titers, productivity) objectives based on given COGS ($ per ton) of target chemical. From cell efficiency objectives it guesstimates the R&D investment (money and time) and ROI expectations Code:act.installer.bing.CostModel
Code (viz server):costModelUI
Source model:XLS

License and Contributing

Code licensed under the GNU General Public License v3.0. If an alternative license is desired, please contact 20n.

Original Authors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].