All Projects → opencypher → Morpheus

opencypher / Morpheus

Licence: apache-2.0
Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Morpheus

gan deeplearning4j
Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-93.73%)
Mutual labels:  big-data, apache-spark
SynapseML
Simple and Distributed Machine Learning
Stars: ✭ 3,355 (+1007.26%)
Mutual labels:  big-data, apache-spark
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-87.13%)
Mutual labels:  big-data, apache-spark
Detecting-Malicious-URL-Machine-Learning
No description or website provided.
Stars: ✭ 47 (-84.49%)
Mutual labels:  big-data, apache-spark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-63.37%)
Mutual labels:  big-data, apache-spark
mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-93.4%)
Mutual labels:  big-data, apache-spark
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-89.44%)
Mutual labels:  big-data, apache-spark
Sparkling Graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-54.13%)
Mutual labels:  graph, big-data
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-95.71%)
Mutual labels:  big-data, apache-spark
pyspark-cheatsheet
PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (-62.05%)
Mutual labels:  big-data, apache-spark
Movies Python Bolt
Neo4j Movies Example application with Flask backend using the neo4j-python-driver
Stars: ✭ 197 (-34.98%)
Mutual labels:  graph, cypher
Cypher For Gremlin
Cypher for Gremlin adds Cypher support to any Gremlin graph database.
Stars: ✭ 267 (-11.88%)
Mutual labels:  graph, cypher
Gun
An open source cybersecurity protocol for syncing decentralized graph data.
Stars: ✭ 15,172 (+4907.26%)
Mutual labels:  graph, big-data
awesome-tools
curated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (-89.77%)
Mutual labels:  big-data, apache-spark
Neo4j 3d Force Graph
Experiments with Neo4j & 3d-force-graph https://github.com/vasturiano/3d-force-graph
Stars: ✭ 159 (-47.52%)
Mutual labels:  graph, cypher
spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-77.89%)
Mutual labels:  big-data, apache-spark
Movies Javascript Bolt
Neo4j Movies Example with webpack-in-browser app using the neo4j-javascript-driver
Stars: ✭ 123 (-59.41%)
Mutual labels:  graph, cypher
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+441.91%)
Mutual labels:  graph, big-data
SparkProgrammingInScala
Apache Spark Course Material
Stars: ✭ 57 (-81.19%)
Mutual labels:  big-data, apache-spark
mmtf-workshop-2018
Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-83.5%)
Mutual labels:  big-data, apache-spark

Maven Central

Morpheus: Cypher for Apache Spark


NOTE

This project is no longer actively maintained. If you want to know more, please reach out by creating an issue.


Morpheus extends Apache Spark™ with Cypher, the industry's most widely used property graph query language defined and maintained by the openCypher project. It allows for the integration of many data sources and supports multiple graph querying. It enables you to use your Spark cluster to run analytical graph queries. Queries can also return graphs to create processing pipelines.

Note This is the repo formerly known as opencypher/cypher-for-apache-spark

Intended audience

Morpheus allows you to develop complex processing pipelines orchestrated by a powerful and expressive high-level language. In addition to developers and big data integration specialists, Morpheus is also of practical use to data scientists, offering tools allowing for disparate data sources to be integrated into a single graph. From this graph, queries can extract subgraphs of interest into new result graphs, which can be conveniently exported for further processing.

Morpheus builds on the Spark SQL DataFrame API, offering integration with standard Spark SQL processing and also allows integration with GraphX. To learn more about this, please see our examples.

Current status: Pre-release

The functionality and APIs are stabilizing but surface changes (e.g. to the Cypher syntax and semantics for multiple graph processing and graph projections/construction) are still likely to occur. We invite you to try out the project, and we welcome feedback and contributions.

If you are interested in contributing to the project we would love to hear from you; email us at [email protected] or just raise a PR. Please note that this is an openCypher project and contributions can only be accepted if you’ve agreed to the openCypher Contributors Agreement (oCCA).

Morpheus Features

Morpheus is built on top of the Spark DataFrame API and uses features such as the Catalyst optimizer. The Spark representations are accessible and can be converted to representations that integrate with other Spark libraries.

Morpheus supports a subset of Cypher and is the first implementation of multiple graphs and graph query compositionality.

Morpheus currently supports importing graphs from Hive, Neo4j, relational database systems via JDBC and from files stored either locally, in HDFS or S3. Morpheus has a data source API that allows you to plug in custom data importers for external graphs.

Morpheus Roadmap

Morpheus is under rapid development and we are planning to offer support for:

  • a large subset of the Cypher language
  • new Cypher Multiple Graph features
  • injection of custom graph data sources

Supported Spark and Scala versions

As of Morpheus 0.3.0, the project has migrated to Scala 2.12 and Spark 2.4 series. As of Spark 2.4.1 Scala 2.12 is officially supported for Spark. However, only Spark 2.4.2 uses Scala 2.12 for its prebuilt convenience binaries, which means that in order to use Morpheus with a later Spark version, one needs to build it manually.

Get started with Morpheus

Morpheus is currently easiest to use with Scala. Below we explain how you can import a simple graph and run a Cypher query on it.

Building Morpheus

Morpheus is built using Gradle

./gradlew build

Add the Morpheus dependency to your project

In order to use Morpheus add the following dependency:

Maven:

<dependency>
  <groupId>org.opencypher</groupId>
  <artifactId>morpheus-spark-cypher</artifactId>
  <version>0.4.2</version>
</dependency>

sbt:

libraryDependencies += "org.opencypher" % "morpheus-spark-cypher" % "0.4.2"

Remember to add fork in run := true in your build.sbt for scala projects; this is not Morpheus specific, but a quirk of spark execution that will help prevent problems.

Hello Morpheus

Cypher is based on the property graph data model, comprising labelled nodes and typed relationships, with a relationship either connecting two nodes, or forming a self-loop on a single node. Both nodes and relationships are uniquely identified by an ID (Morpheus internally uses Array[Byte] to represent identifiers and auto-casts Long, String and Integer values), and contain a set of properties.

The following example shows how to convert a social network represented by two DataFrames to a PropertyGraph. Once the property graph is constructed, it supports Cypher queries via its cypher method.

import org.apache.spark.sql.DataFrame
import org.opencypher.morpheus.api.MorpheusSession
import org.opencypher.morpheus.api.io.{MorpheusNodeTable, MorpheusRelationshipTable}
import org.opencypher.morpheus.util.App

/**
  * Demonstrates basic usage of the Morpheus API by loading an example graph from [[DataFrame]]s.
  */
object DataFrameInputExample extends App {
  // 1) Create Morpheus session and retrieve Spark session
  implicit val morpheus: MorpheusSession = MorpheusSession.local()
  val spark = morpheus.sparkSession

  import spark.sqlContext.implicits._

  // 2) Generate some DataFrames that we'd like to interpret as a property graph.
  val nodesDF = spark.createDataset(Seq(
    (0L, "Alice", 42L),
    (1L, "Bob", 23L),
    (2L, "Eve", 84L)
  )).toDF("id", "name", "age")
  val relsDF = spark.createDataset(Seq(
    (0L, 0L, 1L, "23/01/1987"),
    (1L, 1L, 2L, "12/12/2009")
  )).toDF("id", "source", "target", "since")

  // 3) Generate node- and relationship tables that wrap the DataFrames. The mapping between graph elements and columns
  //    is derived using naming conventions for identifier columns.
  val personTable = MorpheusNodeTable(Set("Person"), nodesDF)
  val friendsTable = MorpheusRelationshipTable("KNOWS", relsDF)

  // 4) Create property graph from graph scans
  val graph = morpheus.readFrom(personTable, friendsTable)

  // 5) Execute Cypher query and print results
  val result = graph.cypher("MATCH (n:Person) RETURN n.name")

  // 6) Collect results into string by selecting a specific column.
  //    This operation may be very expensive as it materializes results locally.
  val names: Set[String] = result.records.table.df.collect().map(_.getAs[String]("n_name")).toSet

  println(names)
}

The above program prints:

Set(Alice, Bob, Eve)

More examples, including multiple graph features, can be found in the examples module.

Run example Scala apps via command line

You can use Gradle to run a specific Scala application from command line. For example, to run the DataFrameInputExample within the morpheus-examples module, we just call:

./gradlew morpheus-examples:runApp -PmainClass=org.opencypher.morpheus.examples.DataFrameInputExample

Next steps

How to contribute

We would love to find out about any issues you encounter and are happy to accept contributions following a Contributors License Agreement (CLA) signature as per the process outlined in our contribution guidelines.

License

The project is licensed under the Apache Software License, Version 2.0, with an extended attribution notice as described in the license header.

Copyright

© Copyright 2016-2019 Neo4j, Inc.

Apache Spark™, Spark, and Apache are registered trademarks of the Apache Software Foundation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].