Waterwheel

Waterwheel is a distributed append-only store designed for high-throughput data ingestion and real-time temporal range queries. Waterwheel supports over 1 million insertions per seconds and millisecond queries simultaneously.

Data & Query Model

Data tuples continuouly arrive at the system.

A tuple consists of

An indexing key;
A timstamp;
Any number of attributes;

We assume the timestamps of the tuples are roughly in increasing order.

A user query contains:

A key range constraint (e.g.,100 < key < 250)
A temporal range constraint (e.g., last 5 minitues, or 9pm to 12pm yesterday)
A user-defined predicate (optional, could be an arbitrary user-defined logic which determines if a tuple qulifies.)
Aggregator (optional, e.g., group-by, sum, avg, min, max)
Sortor (optional, sort on any combination of attributes in decreasing or increasing order)

Requirement

JDK 8 or higher;
maven;
Apache Storm (not needed in local mode);
HDFS (not needed in local mode);

Quick Start

1. Local mode

Running our system in local model is the easiest way to get a feeling of the system. Local model is typically used internally to debug the topology. If you want to use our system in production, we highly suggest you to run our system in cluster model to fully exploit the performance of the system.

To run our system in local mode, you should follow those steps:

Download the source codes

$ git clone https://github.com/ADSC-Cloud/Waterwheel

Compile the source codeF

$ mvn clean install -DskipTests

Launch the system

Run the following command to launch the system

$ mvn exec:java -pl topology -Dexec.mainClass=indexingTopology.topology.kingbase.KingBaseTopology -Dexec.args="-m submit --local -f conf/conf.yaml"

Open a new terminal and run the following command to ingest tuples to the system:

$ mvn exec:java -pl topology -Dexec.mainClass=indexingTopology.topology.kingbase.KingBaseTopology -Dexec.args="-m ingest -r 10000 --ingest-server-ip localhost"

Deploy Web-UI daemon

$ mvn tomcat7:run -pl web

Now you can get access to the web ui via http://localhost:8080.

It looks like this:

On the dashboard, you can see the instantenaous insertion throughput and the resource utilization, as well as a table showing system parameters.

Run queries

You can try to run some demo queries either on the web ui or via terminal commandline.

On the web ui, you can goto the query demo interface by clicking the Try Query bottom of the left of the dashboard. In the demo interface, you can specify the kay ranges and the temporal ranges and run the query by clicking the Query! bottom.

Or alternatively, you can open a new terminal and run the following command to issue generated queries:

$ mvn exec:java -pl topology -Dexec.mainClass=indexingTopology.topology.kingbase.KingBaseTopology -Dexec.args="-m query --query-server-ip localhost"

Use -h to print the detailed usage of the arguments.

2. Cluster model

Deploy Apache Storm and make sure that the nimbus and supervisors are running properly.
Setup HDFS
Download the source codes

$ git clone https://github.com/ADSC-Cloud/Waterwheel

Make sure that <scope>provided</scope> is uncommented in pom.xml file.
Change the configures in config/TopologyConfig accordingly.

set HDFSFlag = true to use HDFS as the storage system.

Create a folder for the system in HDFS and set dataDir in the config file properly. Make sure that the folder is writable.

Compile the source code

$ mvn clean install -DskipTests

Launch Apache Storm's nimbus and supervisors properly.
Submit the topology to Apache Storm

$ storm jar SOURCE_CODE_PATH/target/IndexingTopology-1.0-SNAPSHOT.jar indexingTopologyNormalDistributionTopology append-only-store

Publications

DITIR: Distributed Index for High Throughput Trajectory Insertion and Real-time Temporal Range Query. VLDB 2017 (demo). [pdf] [cite]
Waterwheel: Realtime Indexing and Temporal Range Query Processing over Massive Data Streams. To appear in ICDE 2018. [pdf] [cite]

Acknowledgement

This work was partially supported by SingAREN/AWS Cloud Credit for Research Program 2016.

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ADSC-Cloud / Waterwheel

Programming Languages

Labels

Projects that are alternatives of or similar to Waterwheel