All Projects → apache → Samza Hello Samza

apache / Samza Hello Samza

Licence: apache-2.0
Mirror of Apache Samza

Programming Languages

java
68154 projects - #9 most used programming language
scala
5932 projects

Labels

Projects that are alternatives of or similar to Samza Hello Samza

Spark Website
Apache Spark Website
Stars: ✭ 75 (-24.24%)
Mutual labels:  big-data
Smart Array To Tree
Convert large amounts of data array to tree fastly
Stars: ✭ 91 (-8.08%)
Mutual labels:  big-data
Streamx
kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (-3.03%)
Mutual labels:  big-data
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-20.2%)
Mutual labels:  big-data
Dataengineeringproject
Example end to end data engineering project.
Stars: ✭ 82 (-17.17%)
Mutual labels:  big-data
Hazelcast Python Client
Hazelcast IMDG Python Client
Stars: ✭ 92 (-7.07%)
Mutual labels:  big-data
Labs
Research on distributed system
Stars: ✭ 73 (-26.26%)
Mutual labels:  big-data
Kudu
Mirror of Apache Kudu
Stars: ✭ 1,360 (+1273.74%)
Mutual labels:  big-data
Parquet Mr
Apache Parquet
Stars: ✭ 1,278 (+1190.91%)
Mutual labels:  big-data
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1251.52%)
Mutual labels:  big-data
Iotdb
Apache IoTDB
Stars: ✭ 1,221 (+1133.33%)
Mutual labels:  big-data
Panoptes
A Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-19.19%)
Mutual labels:  big-data
Reef
Mirror of Apache REEF
Stars: ✭ 92 (-7.07%)
Mutual labels:  big-data
Attic Predictionio Template Recommender
PredictionIO Recommendation Engine Template (Scala-based parallelized engine)
Stars: ✭ 78 (-21.21%)
Mutual labels:  big-data
Logisland
Scalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-2.02%)
Mutual labels:  big-data
Cookbook
The Data Engineering Cookbook
Stars: ✭ 9,829 (+9828.28%)
Mutual labels:  big-data
Bitcoin Value Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-8.08%)
Mutual labels:  big-data
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+11002.02%)
Mutual labels:  big-data
Orc
An ORC file format reader and writer for Go.
Stars: ✭ 97 (-2.02%)
Mutual labels:  big-data
Treeviz
Tree diagrams with JavaScript 🌲 📈
Stars: ✭ 95 (-4.04%)
Mutual labels:  big-data

hello-samza

Hello Samza is a starter project for Apache Samza jobs.

About

Hello Samza is developed as part of the Apache Samza project. Please direct questions, improvements and bug fixes there. Questions about Hello Samza are welcome on the dev list and the Samza JIRA has a hello-samza component for filing tickets.

Instructions

The Hello Samza project contains example Samza applications of high-level API as well as low-level API. The following are the instructions to install the binaries and run the applications in a local Yarn cluster. See also Hello Samza and Hello Samza High Level API for more information.

1. Get the Code

Check out the hello-samza project:

git clone https://gitbox.apache.org/repos/asf/samza-hello-samza.git hello-samza
cd hello-samza

To build hello-samza with the latest Samza master, you can switch to the latest branch:

git checkout latest

This project contains everything you'll need to run your first Samza application.

2. Start a Grid

A Samza grid usually comprises three different systems: YARN, Kafka, and ZooKeeper. The hello-samza project comes with a script called "grid" to help you setup these systems. Start by running:

./bin/grid bootstrap

This command will download, install, and start ZooKeeper, Kafka, and YARN. It will also check out the latest version of Samza and build it. All package files will be put in a sub-directory called "deploy" inside hello-samza's root folder.

If you get a complaint that JAVA_HOME is not set, then you'll need to set it to the path where Java is installed on your system.

Once the grid command completes, you can verify that YARN is up and running by going to http://localhost:8088. This is the YARN UI.

3. Build a Samza Application Package

Before you can run a Samza application, you need to build a package for it. This package is what YARN uses to deploy your apps on the grid. Use the following command in hello-samza project to build and deploy the example applications:

./bin/deploy.sh

4. Run a Samza Application

After you've built your Samza package, you can start the example applications on the grid.

- High-level API Examples

Package samza.examples.cookbook contains various examples of high-level API operator usage, such as map, partitionBy, window and join. Each example is a runnable Samza application with the steps in the class javadocs, e.g PageViewAdClickJoiner.

Package samza.examples.wikipedia.application contains a small Samza application which consumes the real-time feeds from Wikipedia, extracts the metadata of the events, and calculates statistics of all edits in a 10-second window. You can start the app on the grid using the run-app.sh script:

./deploy/samza/bin/run-app.sh --config-path=$PWD/deploy/samza/config/wikipedia-application.properties

Once the job is started, we can tail the kafka topic by:

./deploy/kafka/bin/kafka-console-consumer.sh  --bootstrap-server localhost:9092 --topic wikipedia-stats

A code walkthrough of this application can be found here.

- Low-level API Examples

Package samza.examples.wikipedia.task contains the low-level API Samza code for the Wikipedia example. To run it, use the following scripts:

deploy/samza/bin/run-app.sh --config-path=$PWD/deploy/samza/config/wikipedia-feed.properties
deploy/samza/bin/run-app.sh --config-path=$PWD/deploy/samza/config/wikipedia-parser.properties
deploy/samza/bin/run-app.sh --config-path=$PWD/deploy/samza/config/wikipedia-stats.properties

Once the jobs are started, you can use the same kafka-console-consumer.sh command as in the high-level API Wikipedia example to check out the output of the statistics.

5. Run all the examples as Integration Test

Every example above are ran with a few messages as Integration test using TestRunner API. You can find all the testing samples in src/test/java. To run it use:

mvn clean package

Run Single example as test use:

mvn test -Dtest=<ClassName>

Contribution

To start contributing on Hello Samza first read Rules and Contributor Corner. Notice that Hello Samza git repository does not support git pull request.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].