All Projects → SmartDataAnalytics → BioKEEN

SmartDataAnalytics / BioKEEN

Licence: MIT License
A computational library for learning and evaluating biological knowledge graph embeddings

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to BioKEEN

CATT
An ultra-sensitive and precise tool for characterizing T cell CDR3 sequences in TCR-seq and RNA-seq data.
Stars: ✭ 17 (-58.54%)
Mutual labels:  bioinformatics
full spectrum bioinformatics
An open-access bioinformatics text
Stars: ✭ 26 (-36.59%)
Mutual labels:  bioinformatics
GenomicDataCommons
Provide R access to the NCI Genomic Data Commons portal.
Stars: ✭ 64 (+56.1%)
Mutual labels:  bioinformatics
orfipy
Fast and flexible ORF finder
Stars: ✭ 27 (-34.15%)
Mutual labels:  bioinformatics
verifiable-data
Open Source Decentralized Identifiers and Verifiable Credentials Infrastructure and Tooling
Stars: ✭ 18 (-56.1%)
Mutual labels:  linked-data
ngstools
My own tools code for NGS data analysis (Next Generation Sequencing)
Stars: ✭ 28 (-31.71%)
Mutual labels:  bioinformatics
biskit
A Python platform for Structural Bioinformatics
Stars: ✭ 47 (+14.63%)
Mutual labels:  bioinformatics
rdf-ldp
A suite of LDP software and middleware for RDF.rb & Rack
Stars: ✭ 14 (-65.85%)
Mutual labels:  linked-data
OpenGene.jl
(No maintenance) OpenGene, core libraries for NGS data analysis and bioinformatics in Julia
Stars: ✭ 60 (+46.34%)
Mutual labels:  bioinformatics
StackedDAE
Stacked Denoising AutoEncoder based on TensorFlow
Stars: ✭ 23 (-43.9%)
Mutual labels:  bioinformatics
awesome-ontology
A curated list of ontology things
Stars: ✭ 73 (+78.05%)
Mutual labels:  linked-data
adversarial-relation-classification
Unsupervised domain adaptation method for relation extraction
Stars: ✭ 18 (-56.1%)
Mutual labels:  bioinformatics
SumStatsRehab
GWAS summary statistics files QC tool
Stars: ✭ 19 (-53.66%)
Mutual labels:  bioinformatics
bystro
Bystro genetic analysis (annotation, filtering, statistics)
Stars: ✭ 31 (-24.39%)
Mutual labels:  bioinformatics
plasmidtron
Assembling the cause of phenotypes and genotypes from NGS data
Stars: ✭ 27 (-34.15%)
Mutual labels:  bioinformatics
sparql-micro-service
SPARQL micro-services: A lightweight approach to query Web APIs with SPARQL
Stars: ✭ 22 (-46.34%)
Mutual labels:  linked-data
referenceseeker
Rapid determination of appropriate reference genomes.
Stars: ✭ 65 (+58.54%)
Mutual labels:  bioinformatics
flexidot
Highly customizable, ambiguity-aware dotplots for visual sequence analyses
Stars: ✭ 73 (+78.05%)
Mutual labels:  bioinformatics
ccs
CCS: Generate Highly Accurate Single-Molecule Consensus Reads (HiFi Reads)
Stars: ✭ 79 (+92.68%)
Mutual labels:  bioinformatics
bio tools
Useful bioinformatic scripts
Stars: ✭ 35 (-14.63%)
Mutual labels:  bioinformatics

BioKEEN build Coverage Status on CodeCov Documentation Status zenodo

BioKEEN (Biological KnowlEdge EmbeddiNgs) is a package for training and evaluating biological knowledge graph embeddings built on PyKEEN.

Because we use PyKEEN as the underlying software package, implementations of 10 knowledge graph embedding models are currently available for BioKEEN. Furthermore, BioKEEN can be run in training mode in which users provide their own set of hyper-parameter values, or in hyper-parameter optimization mode to find suitable hyper-parameter values from set of user defined values.

Through the integration of the Bio2BEL [2] software numerous biomedical databases are directly accessible within BioKEEN.

BioKEEN can also be run without having experience in programing by using its interactive command line interface that can be started with the command “biokeen” from a terminal.

Share Your Experimental Artifacts

You can share you trained KGE models along the other experimental artifacts through the KEEN-Model-Zoo.

Tutorials

A brief tutorial on how to get started with BioKEEN is available here.

https://i.vimeocdn.com/video/755767182.jpg?mw=1100&mh=619&q=70

Further tutorials are can be found in the notebooks directory and in our documentation.

Citation

If you find BioKEEN useful in your work, please consider citing:

[1]Ali, M., et al. (2019). BioKEEN: A library for learning and evaluating biological knowledge graph embeddings. Bioinformatics, btz117.

Note: ComPath has been updated, for this reason we have uploaded the dataset version that we have used for our experiments: dataset

Installation Current version on PyPI Stable Supported Python Versions MIT License

To install biokeen, Python 3.6+ is required, and we recommend to install it on Linux or Mac OS systems. Please run following command:

$ pip install git+https://github.com/SmartDataAnalytics/BioKEEN.git

Alternatively, it can be installed from the source for development with:

$ git clone https://github.com/SmartDataAnalytics/BioKEEN.git biokeen
$ cd biokeen
$ pip install -e .

Contributing

Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.rst for more information on getting involved.

CLI Usage

To show BioKEEN's available commands, please run following command:

biokeen

Starting the Training/HPO Pipeline - Set Up Your Experiment within 60 seconds

To configure an experiment via the CLI, please run following command:

biokeen start

To start BioKEEN with an existing configuration file, please run the following command:

biokeen start -f /path/to/config.json

Starting the Prediction Pipeline

To make prediction based on a trained model, please run following command:

biokeen predict -m /path/to/model/directory -d /path/to/data/directory

where the value for the argument -m is the directory containing the model, in more detail following files must be contained in the directory:

  • configuration.json
  • entities_to_embeddings.json
  • relations_to_embeddings.json
  • trained_model.pkl

These files are created automatically created after model is trained (and evaluated) and exported in your specified output directory.

The value for the argument -d is the directory containing the data for which inference should be applied, and it needs to contain following files:

  • entities.tsv
  • relations.tsv

where entities.tsv contains all entities of interest, and relations.tsv all relations. Both files should contain should contain a single column containing all the entities/relations. Based on these files, PyKEEN will create all triple permutations, and computes the predictions for them, and saves them in data directory in predictions.tsv.

Summarize the Results of All Experiments

To summarize the results of all experiments, please run following command:

biokeen summarize -d /path/to/experiments/directory -o /path/to/output/file.csv

Getting Bio2BEL Data

To download and structure the data from a Bio2BEL repository, run:

biokeen data get <name>

Where <name> can be any repository name in Bio2BEL such as hippie, mirtarbase.

References

[2]Hoyt, C., et al. (2019). Integration of Structured Biological Data Sources using Biological Expression Language. bioRxiv, 631812.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].