All Projects → divolte → Divolte Collector

divolte / Divolte Collector

Licence: apache-2.0
Divolte Collector

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Divolte Collector

Open Bank Mark
A bank simulation application using mainly Clojure, which can be used to end-to-end test and show some graphs.
Stars: ✭ 81 (-69.32%)
Mutual labels:  kafka, analytics, avro
Storagetapper
StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (-12.12%)
Mutual labels:  kafka, avro, hdfs
Sparta
Real Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+94.32%)
Mutual labels:  kafka, analytics, hdfs
Schema Registry
Confluent Schema Registry for Kafka
Stars: ✭ 1,647 (+523.86%)
Mutual labels:  kafka, avro
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-63.26%)
Mutual labels:  kafka, analytics
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+4063.26%)
Mutual labels:  kafka, hdfs
Camus
Mirror of Linkedin's Camus
Stars: ✭ 81 (-69.32%)
Mutual labels:  kafka, hdfs
Samsara
Samsara is a real-time analytics platform
Stars: ✭ 132 (-50%)
Mutual labels:  kafka, analytics
Slimmessagebus
Lightweight message bus interface for .NET (pub/sub and request-response) with transport plugins for popular message brokers.
Stars: ✭ 120 (-54.55%)
Mutual labels:  kafka, avro
Kafka Connect Mongodb
**Unofficial / Community** Kafka Connect MongoDB Sink Connector - Find the official MongoDB Kafka Connector here: https://www.mongodb.com/kafka-connector
Stars: ✭ 137 (-48.11%)
Mutual labels:  kafka, avro
Mongo Kafka
MongoDB Kafka Connector
Stars: ✭ 166 (-37.12%)
Mutual labels:  kafka, avro
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-65.15%)
Mutual labels:  kafka, hdfs
Kaufmann ex
Kafka backed service library.
Stars: ✭ 86 (-67.42%)
Mutual labels:  kafka, avro
Schema Registry
A CLI and Go client for Kafka Schema Registry
Stars: ✭ 105 (-60.23%)
Mutual labels:  kafka, avro
Dcos Commons
DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Stars: ✭ 162 (-38.64%)
Mutual labels:  kafka, hdfs
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-32.95%)
Mutual labels:  kafka, avro
Abris
Avro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (-50.76%)
Mutual labels:  kafka, avro
Examples
Demo applications and code examples for Confluent Platform and Apache Kafka
Stars: ✭ 571 (+116.29%)
Mutual labels:  kafka, avro
Flume Canal Source
Flume NG Canal source
Stars: ✭ 56 (-78.79%)
Mutual labels:  kafka, hdfs
Kop
Kafka-on-Pulsar - A protocol handler that brings native Kafka protocol to Apache Pulsar
Stars: ✭ 159 (-39.77%)
Mutual labels:  kafka, pubsub

Build Status

Divolte Collector

Scalable clickstream collection for Hadoop and Kafka

Divolte Collector is a scalable and performant server for collecting clickstream data in HDFS and on Kafka topics. It uses a JavaScript tag on the client side to gather user interaction data, similar to many other web tracking solutions. Divolte Collector can be used as the foundation to build anything from basic web analytics dashboarding to real-time recommender engines or banner optimization systems.

https://divolte.io

Divolte Collector

Online documentation and downloads

You can find the latest downloads and documentation on our project website. There is a series of examples for working with collected data in Spark, Hive / Impala, and Kafka in this repository: https://github.com/divolte/divolte-examples.

Features

  • Single tag site integration: Including Divolte Collector is a HTML one-liner. Just load the JavaScript at the end of your document body.
  • Built for Hadoop and Kafka, with experimental support for Google Cloud Storage: All collected data is written directly to HDFS, GCS or Kafka. No ETL or intermediate storage.
  • Structured data collection: All data is captured in Apache Avro records using your own schema definition. Divolte Collector does not enforce a particular structure on your data.
  • User agent parsing: It's not just a string. Add rich user-agent information to your click event records on the fly.
  • ip2geo lookup: Attach geo-coordinates to requests on the fly. (This requires a third-party database; a free version is available.)
  • Fast: Handle many thousands of requests per second on a single node. Scale out as you need.
  • Custom events: Just like any web analytics solution, you can log any event. Supply custom parameters in your page or JavaScript and map them onto your Avro schema.
  • Integrate with anything: Work with anything that understands Avro and HDFS, GCS or Kafka. Hive, Impala, Spark, Spark Streaming, Storm, etc. No log file parsing is required.
  • Open source: Divolte Collector is hosted on GitHub and released under the Apache License, Version 2.0.

Building Prerequisites

In order to build the Divolte Collector you need to have following installed:

  • Java 8 SDK (or newer). We build and test with Oracle's SDK; other variants should work. (Let us know!)
  • Sphinx 1.2.x (or newer). This is only required for building the user documentation.

Building

To build the Divolte Collector server itself:

% ./gradlew zip

or

% ./gradlew tarball

This will build everything and produce an elementary distribution archive under the build/distributions/ directory.

To build the User Guide:

% ./gradlew userdoc

This will build the documentation and place it under the build/userdoc/html/ directory.

Testing

Unit tests can be executed with:

% ./gradlew test

By default this will skip browser-based integration tests. Currently browser-based testing is supported using:

Chromedriver

ChromeDriver must be installed locally. Under OS X this can be installed via HomeBrew:

% brew install chromedriver

Tests can then be executed:

% SELENIUM_DRIVER=chrome CHROME_DRIVER=$(which chromedriver) ./gradlew test

Safari Webdriver

Safari (from version 10) has native Webdriver support. To set this up:

  1. Enable the developer menu: Preferences|Advanced|Show Develop menu in menu bar
  2. In the Develop menu, enable Allow Remote Automation.
  3. First time only, execute safaridriver -p 0 from the command-line and authorise the driver to connect to Safari.

Tests can then be executed:

% SELENIUM_DRIVER=safari ./gradlew test

PhantomJS

PhantomJS must be installed locally. Under OS X this can be installed via HomeBrew:

% brew install phantomjs

Tests can then be executed:

% SELENIUM_DRIVER=phantom ./gradlew test

SauceLabs

If you have a SauceLabs account, you can test against a wide variety of browsers. Once you have a username and API key and Sauce Connect running, tests can then be executed:

% export SAUCE_USERNAME=<username>
% export SAUCE_ACCESS_KEY=<api key>
% SELENIUM_DRIVER=sauce ./gradlew test

These tests can take quite some time to execute. Not all succeed.

License

The Divolte Collector is licensed under the terms of the Apache License, Version 2.0.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].