All Projects → wanghan0501 → AdRealTimeAnalysis

wanghan0501 / AdRealTimeAnalysis

Licence: other
四川大学拓思艾诺广告流量实时分析项目

Programming Languages

java
68154 projects - #9 most used programming language
scala
5932 projects

Projects that are alternatives of or similar to AdRealTimeAnalysis

bigdatatutorial
bigdatatutorial
Stars: ✭ 34 (+54.55%)
Mutual labels:  spark-streaming
open-stream-processing-benchmark
This repository contains the code base for the Open Stream Processing Benchmark.
Stars: ✭ 37 (+68.18%)
Mutual labels:  spark-streaming
xxhadoop
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (+68.18%)
Mutual labels:  spark-streaming
fdp-modelserver
An umbrella project for multiple implementations of model serving
Stars: ✭ 47 (+113.64%)
Mutual labels:  spark-streaming
Tweet-Analysis-With-Kafka-and-Spark
A real time analytics dashboard to analyze the trending hashtags and @ mentions at any location using kafka and spark streaming.
Stars: ✭ 18 (-18.18%)
Mutual labels:  spark-streaming
interview-refresh-java-bigdata
a one-stop repo to lookup for code snippets of core java concepts, sql, data structures as well as big data. It also consists of interview questions asked in real-life.
Stars: ✭ 25 (+13.64%)
Mutual labels:  spark-streaming
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+881.82%)
Mutual labels:  spark-streaming
cassandra.realtime
Different ways to process data into Cassandra in realtime with technologies such as Kafka, Spark, Akka, Flink
Stars: ✭ 25 (+13.64%)
Mutual labels:  spark-streaming
T-Watch
Real Time Twitter Sentiment Analysis Product
Stars: ✭ 20 (-9.09%)
Mutual labels:  spark-streaming
BigInsights-on-Apache-Hadoop
Example projects for 'BigInsights for Apache Hadoop' on IBM Bluemix
Stars: ✭ 21 (-4.55%)
Mutual labels:  spark-streaming
Real-time-log-analysis-system
🐧基于spark streaming+flume+kafka+hbase的实时日志处理分析系统(分为控制台版本和基于springboot、Echarts等的Web UI可视化版本)
Stars: ✭ 31 (+40.91%)
Mutual labels:  spark-streaming
kafka-twitter-spark-streaming
Counting Tweets Per User in Real-Time
Stars: ✭ 38 (+72.73%)
Mutual labels:  spark-streaming
qs-hadoop
大数据生态圈学习
Stars: ✭ 18 (-18.18%)
Mutual labels:  spark-streaming
ExDeMon
A general purpose metrics monitor implemented with Apache Spark. Kafka source, Elastic sink, aggregate metrics, different analysis, notifications, actions, live configuration update, missing metrics, ...
Stars: ✭ 19 (-13.64%)
Mutual labels:  spark-streaming
wasp
WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-13.64%)
Mutual labels:  spark-streaming
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+1022.73%)
Mutual labels:  spark-streaming
seatunnel-example
seatunnel plugin developing examples.
Stars: ✭ 27 (+22.73%)
Mutual labels:  spark-streaming
spark-utils
Basic framework utilities to quickly start writing production ready Apache Spark applications
Stars: ✭ 25 (+13.64%)
Mutual labels:  spark-streaming
architect big data solutions with spark
code, labs and lectures for the course
Stars: ✭ 40 (+81.82%)
Mutual labels:  spark-streaming
bitnami-docker-spark
Bitnami Docker Image for Apache Spark
Stars: ✭ 239 (+986.36%)
Mutual labels:  spark-streaming

AdRealTimeAnalysis

四川大学拓思艾诺广告流量实时分析项目

需求

  1. 实现实时的动态黑名单机制,将每天对某个广告点击超过100次的用户拉黑

  2. 基于黑名单的非法广告点击流量过滤

  3. 统计每天各省各城市各广告的点击流量实时统计(基于需求二)

  4. 统计每天各省的top3热门广告(基于需求二)

  5. 统计各个广告最近一个小时内的点击趋势:各个广告最近1小时内各分钟的点击量(基于需求二)

  6. 实时计算每天各省城市各广告的点击量(基于需求二),更新到MySQL

实现思路

  1. 实时计算各batch中的每天各用户对各广告的点击次数

  2. 使用高性能方式将每天各用户对各广告的点击次数写入MySQL中(更新)

  3. 使用filter过滤出每天对某个广告点击超过100次的黑名单用户,并写入MySQL中

  4. 使用transform操作,对每个batch RDD进行处理,都动态加载MySQL中的黑名单生成RDD,然后进行join后,过滤掉batch RDD中的黑名单用户的广告点击行为

  5. 使用updateStateByKey操作,实时计算每天各省各城市各广告的点击量,并时候更新到MySQL

  6. 使用transform结合Spark SQL,统计每天各省份top3热门广告:首先以每天各省各城市各广告的点击量数据作为基础,首先统计出每天各省份各广告的点击量;然后启动一个异步子线程,使用Spark SQL动态将数据RDD转换为DataFrame后,注册为临时表;最后使用Spark SQL开窗函数,统计出各省份top3热门的广告,并更新到MySQL中

其它

以下链接是一个从前端展示到后台数据交互流程的具体demo WiFiProbeAnalysis

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].