All Projects → dennybritz → Papergraph

dennybritz / Papergraph

AI/ML citation graph with postgres + graphql

Projects that are alternatives of or similar to Papergraph

Super resolution with cnns and gans
Image Super-Resolution Using SRCNN, DRRN, SRGAN, CGAN in Pytorch
Stars: ✭ 176 (-1.12%)
Mutual labels:  jupyter-notebook
Kdd 2019 Hands On
DGL tutorial in KDD 2019
Stars: ✭ 178 (+0%)
Mutual labels:  jupyter-notebook
Lstm anomaly thesis
Anomaly detection for temporal data using LSTMs
Stars: ✭ 178 (+0%)
Mutual labels:  jupyter-notebook
Ethereum demo
This is the code for "Ethereum Explained" by Siraj Raval on Youtube
Stars: ✭ 177 (-0.56%)
Mutual labels:  jupyter-notebook
Julia Tutorial
高速でJuliaを学ぶ入門チュートリアル
Stars: ✭ 176 (-1.12%)
Mutual labels:  jupyter-notebook
Autofeat
Linear Prediction Model with Automated Feature Engineering and Selection Capabilities
Stars: ✭ 178 (+0%)
Mutual labels:  jupyter-notebook
Tamburetei
Fazendo de tamburete as cadeiras de [email protected]
Stars: ✭ 177 (-0.56%)
Mutual labels:  jupyter-notebook
Supercell
supercell
Stars: ✭ 178 (+0%)
Mutual labels:  jupyter-notebook
Tensorflow2 Docs Zh
TF2.0 / TensorFlow 2.0 / TensorFlow2.0 官方文档中文版
Stars: ✭ 177 (-0.56%)
Mutual labels:  jupyter-notebook
Catdcgan
A DCGAN that generate Cat pictures 🐱‍💻
Stars: ✭ 177 (-0.56%)
Mutual labels:  jupyter-notebook
Ocaml Jupyter
An OCaml kernel for Jupyter (IPython) notebook
Stars: ✭ 177 (-0.56%)
Mutual labels:  jupyter-notebook
Notebook
📒 notebook
Stars: ✭ 177 (-0.56%)
Mutual labels:  jupyter-notebook
Domainadaptivereid
Stars: ✭ 178 (+0%)
Mutual labels:  jupyter-notebook
2017 Ccf Bdci Aijudge
2017-CCF-BDCI-让AI当法官(初赛):7th/415 (Top 1.68%)
Stars: ✭ 177 (-0.56%)
Mutual labels:  jupyter-notebook
Python data science and machine learning bootcamp
Jupyter notebook for Udemy course: Python data science and machine learning bootcamp
Stars: ✭ 178 (+0%)
Mutual labels:  jupyter-notebook
Advance Bayesian Modelling With Pymc3
Stars: ✭ 177 (-0.56%)
Mutual labels:  jupyter-notebook
Deeplearninglifesciences
Example code from the book "Deep Learning for the Life Sciences"
Stars: ✭ 178 (+0%)
Mutual labels:  jupyter-notebook
Code Of Learn Deep Learning With Pytorch
This is code of book "Learn Deep Learning with PyTorch"
Stars: ✭ 2,262 (+1170.79%)
Mutual labels:  jupyter-notebook
Chess Surprise Analysis
Find surprising moves in chess games
Stars: ✭ 178 (+0%)
Mutual labels:  jupyter-notebook
Lede Algorithms
Algorithms course materials for the Lede program at Columbia Journalism School
Stars: ✭ 178 (+0%)
Mutual labels:  jupyter-notebook

papergraph

papergraph is a rust library and binary to build and manage a citation graph of Semantic Scholar, focused on AI/ML papers (for now). Data is stored in a postgres database with a Hasura GraphQL backend (schema) on top for easy graph queries. It comes with Jupyter notebooks that show you how to analyze and visualize the data.

Live version at https://papergraph.dbz.dev

Thanks to @ArtirKel for the useful feedback and ideas.

Notebooks

The folllowing notebooks work out of the box using a publicly available API endpoint for the data. You can run them locally, or in the cloud via Google Colab. Please read the caveats about the public endpoint below!

Use Cases

  • Finding landmark papers - Papers with a large citations may be considered landmark papers. The ideas in such papers often form the foundation for incremental improvements. Given some arbitrary paper you're interested in, you may want to know which landmark papers you should study for the required background knowledge.
  • Reference research - When writing a paper, you don't want to miss prior work. Looking through the citation graph for a related paper can help you find potentially interesting papers to read and cite.
  • Graph Analysis - Run sophisticated graph algorithms on the dataset to gain insights

Graph Example

IMPORTANT! Using the public endpoint

The database is publicly available at http://34.107.246.233/v1/graphql, so please be gentle with your queries! This is running on a small postgres server that I'm paying for, so please don't overload it with automated scripts. Be nice :) As long as you're running queries by hand through notebooks everything should be fine.

If you want to do lots of queries you should clone this repo and build the database yourself locally or in the cloud. Instructions for this are below. If you are running Kubernetes, you can also use the scripts in deploy/.

Building the database from a postgresql snapshot

TODO. See this issue

Building the database from scratch

Requirements:

  • Docker

If you want to build the database from scratch, you must download the full S2 research corpus. The total compressed size is currently around ~120GB.

Clone the repo

git clone https://github.com/dennybritz/papergraph
cd papergraph
aws s3 sync --no-sign-request s3://ai2-s2-research-public/open-corpus/2020-04-10/ data/s2-research-corpus

Start up an empty postgres database server and create the schema

export DATABASE_URL=postgres://papergraph:[email protected]:5432/papergraph
export RUST_LOG=info

# Run the postgres docker container
docker-compose up postgres

# Setup the datase and run migrations
docker run --rm --network papergraph_default \
  -e DATABASE_URL \
  dennybritz/papergraph \
  diesel database setup

Now that we have a postgres server with the right database schema running, we need to insert the data:

# Assuming you downloaded the data into /data 
# as shown in the AWS command above
DATA_PATH=data/s2-research-corpus/s2-corpus-017.gz

# Repeat this for all files you want to insert
# This will take a while. On my laptop, each file takes around 1min.
docker run --rm -it --network papergraph_default \
  -e DATABASE_URL -e RUST_LOG \
  -v `pwd`/${DATA_PATH}:/data/${DATA_PATH} \
  dennybritz/papergraph \
  papergraph insert -d /data/${DATA_PATH}

Now that have seeded the database, we can also start Hasura to serve the graphql API. Stop the postgres docker process with ctrl+c and run

docker-compose up

You should now be able to access the API via http://localhost:8080.

Freshness

papergraph is updated when new data snapshots become available. This typically happens once a month. This means it will not contain all the latest papers.

Misc

Generating postgres database dumps

pg_dump -h localhost -p 15432 -F tar -U papergraph papergraph > pg_dump.tar

Build docker image

docker build -t dennybritz/papergraph .

Export graphql schema

gq http://34.107.246.233/v1/graphql --introspect > hasura/schema.graphql  
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].