gchq / Gaffer

Licence: apache-2.0

A large-scale entity and relation database supporting aggregation of properties

Programming Languages

java

68154 projects - #9 most used programming language

javascript

184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Gaffer

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+569.37%)

Mutual labels: spark, big-data, hadoop, hbase

Bigdata Playground

A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL

Stars: ✭ 177 (-89.22%)

Mutual labels: big-data, hadoop, parquet, hbase

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (-75.27%)

Mutual labels: spark, hadoop, parquet, hbase

bigdata-fun

A complete (distributed) BigData stack, running in containers

Stars: ✭ 14 (-99.15%)

Mutual labels: big-data, spark, hadoop, hbase

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+265.9%)

Mutual labels: spark, hadoop, hbase

Janusgraph

JanusGraph: an open-source, distributed graph database

Stars: ✭ 4,277 (+160.48%)

Mutual labels: graph, graph-database, hbase

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+244.46%)

Mutual labels: spark, big-data, hadoop

Dockerfiles

50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu

Stars: ✭ 847 (-48.42%)

Mutual labels: spark, hadoop, hbase

Wedatasphere

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (-77.34%)

Mutual labels: spark, hadoop, hbase

Bigdataguide

大数据学习，从零开始学习大数据，包含大数据学习各阶段学习视频、面试资料

Stars: ✭ 817 (-50.24%)

Mutual labels: spark, hadoop, hbase

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (-47.81%)

Mutual labels: spark, hadoop, hbase

Iceberg

Iceberg is a table format for large, slow-moving tabular data

Stars: ✭ 393 (-76.07%)

Mutual labels: spark, hadoop, parquet

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-96.53%)

Mutual labels: spark, big-data, hadoop

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+1242.75%)

Mutual labels: spark, big-data, hadoop

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+132.22%)

Mutual labels: spark, big-data, hadoop

Szt Bigdata

深圳地铁大数据客流分析系统🚇🚄🌟

Stars: ✭ 826 (-49.7%)

Mutual labels: spark, hadoop, hbase

Weblogsanalysissystem

A big data platform for analyzing web access logs

Stars: ✭ 37 (-97.75%)

Mutual labels: spark, hadoop, hbase

Hadoop cookbook

Cookbook to install Hadoop 2.0+ using Chef

Stars: ✭ 82 (-95.01%)

Mutual labels: spark, hadoop, hbase

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-93.24%)

Mutual labels: big-data, spark, hadoop

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (-97.75%)

Mutual labels: spark, hadoop, hbase

View All Similar Projects ➔

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Gaffer

Gaffer is a graph database framework. It allows the storage of very large graphs containing rich properties on the nodes and edges. Several storage options are available, including Accumulo, Hbase and Parquet.

It is designed to be as flexible, scalable and extensible as possible, allowing for rapid prototyping and transition to production systems.

Gaffer offers:

Rapid query across very large numbers of nodes and edges;
Continual ingest of data at very high data rates, and batch bulk ingest of data via MapReduce or Spark;
Storage of arbitrary Java objects on the nodes and edges;
Automatic, user-configurable in-database aggregation of rich statistical properties (e.g. counts, histograms, sketches) on the nodes and edges;
Versatile query-time summarisation, filtering and transformation of data;
Fine grained data access controls;
Hooks to apply policy and compliance rules to queries;
Automated, rule-based removal of data (typically used to age-off old data);
Retrieval of graph data into Apache Spark for fast and flexible analysis;
A fully-featured REST API.

To get going with Gaffer, visit our getting started pages.

Gaffer is under active development. Version 1.0 of Gaffer was released in October 2017.

License

Gaffer is licensed under the Apache 2 license and is covered by Crown Copyright.

Getting Started

Try it out

We have a demo available to try that is based around a small uk road use dataset. See the example/road-traffic README to try it out.

Building and Deploying

To build Gaffer run mvn clean install -Pquick in the top-level directory. This will build all of Gaffer's core libraries and some examples of how to load and query data.

See our Store documentation page for a list of available Gaffer Stores to chose from and the relevant documentation for each.

Inclusion in other projects

Gaffer is hosted on Maven Central and can easily be incorporated into your own maven projects.

To use Gaffer from the Java API the only required dependencies are the Gaffer graph module and a store module for the specific database technology used to store the data, e.g. for the Accumulo store:

<dependency>
    <groupId>uk.gov.gchq.gaffer</groupId>
    <artifactId>graph</artifactId>
    <version>${gaffer.version}</version>
</dependency>
<dependency>
    <groupId>uk.gov.gchq.gaffer</groupId>
    <artifactId>accumulo-store</artifactId>
    <version>${gaffer.version}</version>
</dependency>

This will include all other mandatory dependencies. Other (optional) components can be added to your project as required.

Documentation

Our Javadoc can be found here.

We have some user guides in our docs.

Related repositories

The gaffer-tools repository contains useful tools to help work with Gaffer. These include:

jar-shader - Used to shade the version of Jackson to avoid incompatibility problems on CDH clusters;
mini-accumulo-cluster - Allows a mini Accumulo cluster to be spun up for testing purposes;
performance-testing - Methods of testing the performance of ingest and query operations against a graph;
python-shell - Allows operations against a graph to be executed from a Python shell;
random-element-generation - Code to generate large volumes of random graph data;
schema-builder - A (beta) visual tool for writing schemas for a graph;
slider - Code to deploy a Gaffer cluster to a YARN cluster using Apache Slider, including the ability to easily run Slider on an AWS EMR cluster;
ui - A basic graph visualisation tool.

Contributing

We welcome contributions to the project. Detailed information on our ways of working can be found here. In brief:

Sign the GCHQ Contributor Licence Agreement;
Push your changes to a fork;
Submit a pull request.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

gchq / Gaffer

Programming Languages

Labels

Projects that are alternatives of or similar to Gaffer

Gaffer

License

Getting Started

Try it out

Building and Deploying

Inclusion in other projects

Documentation

Related repositories

Contributing