All Projects → phenopolis → pheno4j

phenopolis / pheno4j

Licence: MIT license
Pheno4j: a graph based HPO to NGS database

Programming Languages

java
68154 projects - #9 most used programming language
python
139335 projects - #7 most used programming language
shell
77523 projects

Labels

Projects that are alternatives of or similar to pheno4j

neo4j-jdbc
JDBC driver for Neo4j
Stars: ✭ 110 (+254.84%)
Mutual labels:  neo4j
grandcast.fm
A podcast application built using GRANDstack
Stars: ✭ 50 (+61.29%)
Mutual labels:  neo4j
cytoscapeneo4j
Cytoscape plugin for neo4j
Stars: ✭ 18 (-41.94%)
Mutual labels:  neo4j
CyFHIR
A Neo4j Plugin for Handling HL7 FHIR Data
Stars: ✭ 39 (+25.81%)
Mutual labels:  neo4j
plume
Plume is a code property graph analysis library with options to extract the CPG from Java bytecode and store the result in various graph databases.
Stars: ✭ 53 (+70.97%)
Mutual labels:  neo4j
jpa-unit
JUnit extension to test javax.persistence entities
Stars: ✭ 28 (-9.68%)
Mutual labels:  neo4j
neo4j-expire
GraphAware Module for Expiring (Deleting) Nodes and Relationships
Stars: ✭ 30 (-3.23%)
Mutual labels:  neo4j
InteractiveGraph-neo4j
a graph server serves backend neo4j, virtosuo as an InteractiveGraph
Stars: ✭ 102 (+229.03%)
Mutual labels:  neo4j
neo4j-migrations
Automated script runner aka "Migrations" for Neo4j. Inspired by Flyway.
Stars: ✭ 82 (+164.52%)
Mutual labels:  neo4j
gorm-neo4j
GORM for Neo4j
Stars: ✭ 16 (-48.39%)
Mutual labels:  neo4j
neo4j doc manager
Doc manager for Neo4j
Stars: ✭ 95 (+206.45%)
Mutual labels:  neo4j
elixir ravelry
Elixir API using Neo4j database for ElixirConf 2017 talk
Stars: ✭ 21 (-32.26%)
Mutual labels:  neo4j
nlm
Memory for Knowledge Graph, using Neo4j. 知识图谱存储与查询。
Stars: ✭ 43 (+38.71%)
Mutual labels:  neo4j
ComplexNetwork
中国娱乐圈关系挖掘,可以快速的查询明星之间的关系。This is a complex network of course assignments. The realization of the relationship analysis and visualization of China's entertainment industry, you can quickly query the relationship between the stars
Stars: ✭ 24 (-22.58%)
Mutual labels:  neo4j
sdk
Home of the JupiterOne SDK
Stars: ✭ 21 (-32.26%)
Mutual labels:  neo4j
graphql
A GraphQL to Cypher query execution layer for Neo4j and JavaScript GraphQL implementations.
Stars: ✭ 397 (+1180.65%)
Mutual labels:  neo4j
paradise-papers-django
A simple Django web app for searching the Paradise Papers dataset backed by Neo4j
Stars: ✭ 63 (+103.23%)
Mutual labels:  neo4j
django-test-addons
Testing support for different database system like Mongo, Redis, Neo4j, Memcache, Django Rest Framework for django
Stars: ✭ 20 (-35.48%)
Mutual labels:  neo4j
neo4j-django-tutorial
Tutorial to set up a new Django project with Neo4j REST server
Stars: ✭ 73 (+135.48%)
Mutual labels:  neo4j
Public-Transport-SP-Graph-Database
Metropolitan Transport Network from São Paulo mapped in a NoSQL graph database.
Stars: ✭ 25 (-19.35%)
Mutual labels:  neo4j

Build Status Coverage Status

Pheno4j: a graph based HPO to NGS database

Author: Sajid Mughal

Paper published: https://www.ncbi.nlm.nih.gov/pubmed/28633344

Presentation videos:

Purpose

Genetic and phenotype data in JSON, VCF and CSV format and convert them into CSV files that represent Nodes and Relationships that can then be used to populate Pheno4J using the neo4j bulk CSV import tool.

Public datasets

Only two publicly available datasets required:

User specified datasets

Example datasets specified in config.properties:

  • VCF file which contains genotypes (example)
  • VEP JSON file (example)
  • Individuals with HPO terms as CSV file (example)

Pheno4J schema overview

Installation

Local Installation with Exemplar Data

The local version will not be able to handle efficiently a very large dataset since it does not have access to the configuration for the page cache and jvm size. Hence it should be used for testing.

Prerequisites

  • Java 1.8
  • Maven 3

Build Graph and Start up Neo4j on test data

Download the code, build the database, load the test data referenced in config.properties and start the server on port 7474:

git clone https://github.com/phenopolis/pheno4j.git
cd pheno4j
mvn clean compile -P build-graph,run-neo4j

Once the server is running, it can be queried either by going to the web interface on http://localhost:7474/ or using curl to do http requests from the command line (see next section).

Run Example Queries with curl

The curl http queries return data in JSON format and so the response can be parsed using jq.

For example, get count of variants shared between person1 and person2:

curl -H "Content-Type: application/json" -d '{
"query": "WITH [$p1,$p2] AS persons MATCH (p:Person)<-[]-(v:GeneticVariant) WHERE p.personId IN persons WITH v, count(*) as c, persons WHERE c = size(persons) RETURN count(v.variantId);",
"params":{"p1":"person1","p2":"person2"}
}' http://localhost:7474/db/data/cypher

Get ids of persons with variant 22-51171497-G-A:

curl -H "Content-Type: application/json" -d '{
"query": "MATCH (gv:GeneticVariant)-[]->(p:Person) WHERE gv.variantId =$var RETURN p.personId;",
"params":{"var":"22-51171497-G-A"}
}' http://localhost:7474/db/data/cypher

More cypher queries are available here.

Running Pheno4J on your own data

Documentation here.

Server Installation

The server installation can scale to very large datasets as it allows configuration of the JVM size and page cache.

Prerequisites

Deploy code

Run the following in the checkout directory, which will generate a zip file, "graph-bundle.zip", in the target folder:

mvn clean package

Copy graph-bundle.zip to your target server and unzip it.

Update config file to reference your input data

In the conf folder of the extracted zip above, update config.properties to reference your input data.

Run the GraphDatabaseBuilder

This step will take all the input data and build csv files, which are then built into a Neo4j database using their ImportTool. Constraints and Indexes are then created. In the lib folder of the extracted zip above, run the following:

java -cp *:../conf/ com.graph.db.GraphDatabaseBuilder

Link the generated database above to your Neo4j installation

cd $NEO4J_HOME/data/databases
ln -s ${output.folder}/graph-db/data/databases/graph.db graph.db 

${output.folder} is defined in config.properties

Update Neo4j config

Ideally you should hold as much of the data in memory as possible (See here for more information) Set the value of dbms.memory.pagecache.size in ${NEO4J_HOME}/conf/neo4j.conf to the size of the files: NEO4J_HOME/data/databases/graph.db/*store.db*

Start Neo4j

cd $NEO4J_HOME/bin
./neo4j start

Run 'warmup' query

This query will basically hit the entire graph, the result will be all the data stored on the disk will be loaded into memory. (See here for more information) This takes up to 10 minutes for our data.

MATCH (n)
OPTIONAL MATCH (n)-[r]->()
RETURN count(n.prop) + count(r.prop);

Additional Steps

If you would like to connect to your instance from your application tier to handle incoming database requests, you can change the password to the Neo4j instance with the following; the port is the value of dbms.connector.http.listen_address in $NEO4J_HOME/conf/neo4j.conf. The following command will the password to 1:

curl -H "Content-Type: application/json" -X POST -d '{"password":"1"}' -u neo4j:neo4j http://**{HOST}**:**{PORT}**/user/neo4j/password

Example Cypher Queries

Examples can be found here.

Further reading

Additional Documentation

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].