All Projects → bullet-db → bullet-core

bullet-db / bullet-core

Licence: Apache-2.0 license
Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to bullet-core

Aws Etl Orchestrator
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+580.56%)
Mutual labels:  big-data
Detecting-Malicious-URL-Machine-Learning
No description or website provided.
Stars: ✭ 47 (+30.56%)
Mutual labels:  big-data
bagri
XML/Document DB on top of distributed cache
Stars: ✭ 40 (+11.11%)
Mutual labels:  big-data
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+586.11%)
Mutual labels:  big-data
Clickhouse
ClickHouse® is a free analytics DBMS for big data
Stars: ✭ 21,089 (+58480.56%)
Mutual labels:  big-data
masc
Microsoft's contributions for Spark with Apache Accumulo
Stars: ✭ 20 (-44.44%)
Mutual labels:  big-data
Kafka Ui
Open-Source Web GUI for Apache Kafka Management
Stars: ✭ 230 (+538.89%)
Mutual labels:  big-data
incubator-tez
Mirror of Apache Tez (Incubating)
Stars: ✭ 60 (+66.67%)
Mutual labels:  big-data
predictionio-template-recommender
PredictionIO Recommendation Engine Template (Scala-based parallelized engine)
Stars: ✭ 80 (+122.22%)
Mutual labels:  big-data
Social-Network-Analysis-in-Python
Social Network Facebook Analysis (Python, Networkx)
Stars: ✭ 26 (-27.78%)
Mutual labels:  big-data
Cboard
An easy to use, self-service open BI reporting and BI dashboard platform.
Stars: ✭ 2,795 (+7663.89%)
Mutual labels:  big-data
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+8355.56%)
Mutual labels:  big-data
predictionio-sdk-ruby
PredictionIO Ruby SDK
Stars: ✭ 192 (+433.33%)
Mutual labels:  big-data
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (+583.33%)
Mutual labels:  big-data
sketches
HyperLogLog and other probabilistic data structures for mining in data streams
Stars: ✭ 15 (-58.33%)
Mutual labels:  sketches
Trafodion
Apache Trafodion
Stars: ✭ 242 (+572.22%)
Mutual labels:  big-data
acousticbrainz-server
The server components for the AcousticBrainz project
Stars: ✭ 128 (+255.56%)
Mutual labels:  big-data
accumulo-testing
Apache Accumulo Testing
Stars: ✭ 14 (-61.11%)
Mutual labels:  big-data
STAWM
Code for the paper 'A Biologically Inspired Visual Working Memory for Deep Networks'
Stars: ✭ 21 (-41.67%)
Mutual labels:  sketches
TT Tech Space
TT Tech Research Notes
Stars: ✭ 21 (-41.67%)
Mutual labels:  big-data

Bullet Core

Build Status Coverage Status Maven Central

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink. It lets you run queries on this data stream - including hard queries like Count Distincts, Top K etc.

Table of Contents

Background

In Bullet, both the queries and the data flow through the system. There is absolutely no persistence layer! Queries live as long as their duration and operate on data in-memory only. So, the queries in Bullet look forward in time, which is pretty unique for most querying systems.

We created Bullet initially as a simple distributed grep like tool to find events in a click stream (containing high volume - 1 million events per sec -- user interaction data) at Yahoo. In particular, we use it for validating instrumentation that generates these events by interacting with the pages ourselves and finding our own events in this data stream and validate it for the proper key/value pairs. There was nothing as light-weight and cheap as Bullet to do this task. There are many other use-cases for Bullet and indeed, how you use it, depends on your data stream. If you put Bullet on performance metric data, your queries might mostly be finding the 99th percentile of some latency metric etc.

This project is the core library for Bullet that lets us implement Bullet agnostically on any JVM based Stream Processor. See Bullet Storm, which uses this to implement Bullet on Storm and Bullet Spark, on Spark Streaming. This code initially lived inside the Bullet Storm code base up to Bullet Storm Version 0.4.3.

Install

Bullet Core is a library written in Java and published to Bintray and mirrored to JCenter. It is meant to be used to implement Bullet on different Stream Processors or to implement a Bullet PubSub. To see the various versions and set up your project for your package manager (Maven, Gradle etc), see here.

Usage

Once you have added a dependency for Bullet Core, use our abstractions for the PubSub, Parsing, Querying, Windowing, Partitioning, and Sketching as you need to. In particular, see how we abstract running a Bullet Query. You can also look at our reference implementations in Storm and Spark to get a better idea.

Documentation

All documentation is available at Github Pages here.

Links

Quick Links

Contributing

All contributions are welcomed! Feel free to submit PRs for bug fixes, improvements or anything else you like! Submit issues, ask questions using Github issues as normal and we will classify it accordingly. See Contributing for a more in-depth policy. We just ask you to respect our Code of Conduct while you're here.

License

Code licensed under the Apache 2 license. See the LICENSE for terms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].