All Projects → Octavian-ai → English2cypher

Octavian-ai / English2cypher

Licence: unlicense
A model to transform english into Cypher queries, based off the CLEVR-graph dataset

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to English2cypher

Movies Python Bolt
Neo4j Movies Example application with Flask backend using the neo4j-python-driver
Stars: ✭ 197 (+264.81%)
Mutual labels:  graph, cypher
Cypher For Gremlin
Cypher for Gremlin adds Cypher support to any Gremlin graph database.
Stars: ✭ 267 (+394.44%)
Mutual labels:  graph, cypher
Ai Study
人工智能学习资料超全整理,包含机器学习基础ML、深度学习基础DL、计算机视觉CV、自然语言处理NLP、推荐系统、语音识别、图神经网路、算法工程师面试题
Stars: ✭ 93 (+72.22%)
Mutual labels:  ai, graph
Redisgraph
A graph database as a Redis module
Stars: ✭ 1,292 (+2292.59%)
Mutual labels:  graph, cypher
Kglib
Grakn Knowledge Graph Library (ML R&D)
Stars: ✭ 405 (+650%)
Mutual labels:  ai, graph
Movies Javascript Bolt
Neo4j Movies Example with webpack-in-browser app using the neo4j-javascript-driver
Stars: ✭ 123 (+127.78%)
Mutual labels:  graph, cypher
Opennars
OpenNARS for Research 3.0+
Stars: ✭ 264 (+388.89%)
Mutual labels:  ai, graph
Mac Graph
The MacGraph network. An attempt to get MACnets running on graph knowledge
Stars: ✭ 113 (+109.26%)
Mutual labels:  ai, graph
Text summurization abstractive methods
Multiple implementations for abstractive text summurization , using google colab
Stars: ✭ 359 (+564.81%)
Mutual labels:  ai, seq2seq
Popoto
Visual query builder for Neo4j graph database
Stars: ✭ 318 (+488.89%)
Mutual labels:  graph, cypher
Neo4j
Graphs for Everyone
Stars: ✭ 9,582 (+17644.44%)
Mutual labels:  graph, cypher
Ingraph
Incremental view maintenance for openCypher graph queries.
Stars: ✭ 40 (-25.93%)
Mutual labels:  graph, cypher
Movies Java Bolt
Neo4j Movies Example application with SparkJava backend using the neo4j-java-driver
Stars: ✭ 66 (+22.22%)
Mutual labels:  graph, cypher
Neo4j 3d Force Graph
Experiments with Neo4j & 3d-force-graph https://github.com/vasturiano/3d-force-graph
Stars: ✭ 159 (+194.44%)
Mutual labels:  graph, cypher
Morpheus
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (+461.11%)
Mutual labels:  graph, cypher
Opencypher
Specification of the Cypher property graph query language
Stars: ✭ 534 (+888.89%)
Mutual labels:  graph, cypher
Neo4j Helm
Helm Charts for running Neo4j on Kubernetes
Stars: ✭ 43 (-20.37%)
Mutual labels:  graph, cypher
Scaffold Eth
🏗 forkable Ethereum dev stack focused on fast product iterations
Stars: ✭ 1,017 (+1783.33%)
Mutual labels:  graph
Mxnet Seq2seq
Sequence to sequence learning with MXNET
Stars: ✭ 51 (-5.56%)
Mutual labels:  seq2seq
Holodeck Engine
High Fidelity Simulator for Reinforcement Learning and Robotics Research.
Stars: ✭ 48 (-11.11%)
Mutual labels:  ai

English to Cypher

A model to transform english into Cypher queries, based off the CLEVR graph dataset.

This model seeks to transform sentences like:

How many stations are between Koof Lane and Gag Street?

and turn them into answers such as

3

by way of translating the English into equivalent Cypher statements:

MATCH ( var1 ) MATCH ( var2 ) MATCH tmp1 = shortestPath ( ( var1 ) - [ * ] - ( var2 ) ) WHERE var1.name = " Koof Lane " AND var2.name = " Gag Street " WITH nodes ( tmp1 ) AS var3 RETURN length ( var3 ) - 2

Running the code

Prerequisites

From the root directory, first install the pre-requisites:

pipenv install
pipenv shell

All the command line python invocations assume pipenv shell has been previously invoked (E.g. that you are in the virtual environment with the required python modules)

Predictions

The most fun way to see this code in action is to fire up predict mode. You can invoke it with python -m e2c.predict

You'll need to have a Neo4j database for the code to load then query (heads up: The code will delete everything in your provided database, then upload its own graph). The easiest way to do this is to run Docker, then use our script ./start-neo4j-database.sh to create a database with the required extensions and authentication values that the code uses by default.

If you want to use a different database configuration, the arguments --neo-user --neo-password --neo-url will let you specify how to connect to it.

The predict script will automatically download a trained model and its vocab if you do not have that locally.

Here's the prediction program in action:

$ python -m e2c.predict
Example stations from graph:
> Draz Boulevard, Strov Boulevard, Swuct Hospital, Fak Boulevard, Frook Lane, Niwham, Dawbridge, Flip Bridge

Example lines from graph:
> Green Soosh, Green Fliv, Olive Huw, Purple Sweb, Blue Prooy, Blue Moss, Orange Hift, Pink Woog

Example questions:
> How clean is Fak Boulevard?
> How big is Dawbridge?
> What music plays at Swuct Hospital?
> What architectural style is Dawbridge?
> Does Flip Bridge have disabled access?
> Does Niwham have rail connections?
> How many architectural styles does Green Soosh pass through?
> How many music styles does Pink Woog pass through?
> How many sizes of station does Olive Huw pass through?
> How many stations playing classical does Blue Prooy pass through?
> How many clean stations does Green Fliv pass through?
> How many large stations does Olive Huw pass through?
> How many stations with disabled access does Green Soosh pass through?
> How many stations with rail connections does Blue Prooy pass through?
> Which lines is Niwham on?
> How many lines is Flip Bridge on?
> Are Dawbridge and Strov Boulevard on the same line?
> Which stations does Orange Hift pass through?

Ask a question: Does Niwham have rail connections?
Translation into cypher: 'MATCH (var1) WHERE var1.name="Niwham"  WITH 1 AS foo, var1.hasrail AS var2 RETURN var2'

Answer: None

Ask a question: Which stations does Orange Hift pass through?
Translation into cypher: 'MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Orange Hift"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  WITH 1 AS foo, var6.name AS var7  RETURN var7'

Answer: Blel Boulevard, Dent Hospital, Chih Way, Smad Road, Gusk Square, Hum Upon Thames, Ploongwich, Guz Way, Chot Estate, Tump Hospital

Ask a question: What architectural style is Dawbridge?
Translation into cypher: 'MATCH (var1) WHERE var1.name="Dawbridge"  WITH 1 AS foo, var1.architecture AS var2 RETURN var2'

Answer: modernist

Train

You can train the model yourself if you'd like. It takes about 1.5hrs running on a recent NVidia GPU.

First, build the text input data:

python -m e2c.build_data

Then run the training:

python -m e2c.train

Training is quite slow without a GPU. If you don't happen to have a NVIDIA Titan under your desk, we've formatted this project to easily run on Floyd Hub (and even give some nice stats):

sudo pip install -U floyd-cli
floyd login
./floyd-train.sh

During training you can see examples of its predictions in output/:

beam:
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crend"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Cred"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crind"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Rrend"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crond"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crend"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "conerete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crand"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crund"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crend"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "crnece"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crend"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6) WHERE var6ocollectc]e() WHERE var6.architecture = "concrete"  WITH 1
guided:
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crend"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))
input: How many concrete stations are on the Red Crend line?
target:
- MATCH ()-[var1]-()  MATCH (var2:LINE) WHERE var2.name="Red Crend"  WITH 1 AS foo, var1, var2.id AS var3 WHERE var1.line_id = var3  MATCH (var4)-[var1]-() WHERE var4.architecture = "concrete"  WITH 1 AS foo, var4 AS var5 WITH DISTINCT var5 as var6, 1 AS foo  RETURN length(collect(var6))

In this structure, "target" is what the network should output (the ground truth), "beam" is an array of the networks predictions, "guided" is a semi-prediction mode where the network is given the target string and just asked to guess the next token.

Acknowledgements

Thanks to Andrew Jefferson, Ashwath Salimath, Scott Dimond for their support, ideas and proof-reading.

Big shout out to the TensorFlow NMT tutorial which I've heavily based this code on, and to Google for sharing their research.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].