All Projects → wanghan0501 → Usersessionbehaviorofflineanalysis

wanghan0501 / Usersessionbehaviorofflineanalysis

Licence: other
四川大学拓思爱诺用户session行为数据离线分析项目

Programming Languages

scala
5932 projects

Labels

Projects that are alternatives of or similar to Usersessionbehaviorofflineanalysis

Awesome Pulsar
A curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-17.39%)
Mutual labels:  spark
Silex
something to help you spark
Stars: ✭ 61 (-11.59%)
Mutual labels:  spark
Spark Bigquery
Google BigQuery support for Spark, Structured Streaming, SQL, and DataFrames with easy Databricks integration.
Stars: ✭ 65 (-5.8%)
Mutual labels:  spark
Docker Spark Cluster
A Spark cluster setup running on Docker containers
Stars: ✭ 57 (-17.39%)
Mutual labels:  spark
Data Science Cookbook
🎓 Jupyter notebooks from UFC data science course
Stars: ✭ 60 (-13.04%)
Mutual labels:  spark
Spark Doc Zh
Apache Spark 官方文档中文版
Stars: ✭ 1,126 (+1531.88%)
Mutual labels:  spark
Pulsar Spark
When Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-20.29%)
Mutual labels:  spark
Kontextfrei
Writing application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (-2.9%)
Mutual labels:  spark
Waimak
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Stars: ✭ 60 (-13.04%)
Mutual labels:  spark
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-7.25%)
Mutual labels:  spark
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-15.94%)
Mutual labels:  spark
Zemberek Nlp Server
Zemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-13.04%)
Mutual labels:  spark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-8.7%)
Mutual labels:  spark
Model Serving Tutorial
Code and presentation for Strata Model Serving tutorial
Stars: ✭ 57 (-17.39%)
Mutual labels:  spark
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-5.8%)
Mutual labels:  spark
Net.jgp.labs.spark
Apache Spark examples exclusively in Java
Stars: ✭ 55 (-20.29%)
Mutual labels:  spark
Roffildlibrary
Library for MQL5 (MetaTrader) with Python, Java, Apache Spark, AWS
Stars: ✭ 63 (-8.7%)
Mutual labels:  spark
Fast Mrmr
An improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (-2.9%)
Mutual labels:  spark
Thingsboard
Open-source IoT Platform - Device management, data collection, processing and visualization.
Stars: ✭ 10,526 (+15155.07%)
Mutual labels:  spark
Pyspark Twitter Stream Mining
Real-time Machine Learning with Apache Spark on Twitter Public Stream
Stars: ✭ 64 (-7.25%)
Mutual labels:  spark

UserSessionBehaviorOfflineAnalysis

996.icu

四川大学拓思爱诺用户session行为数据离线分析项目

需求1

通过指定taskid,从数据库查询任务相关信息,包括starttime,taskParam等,要求通过数据库连接池的方式获取连接,连接池在整个程序的运行过程中只有一份,且链接数量固定(Jdbc+Mysql)

需求2

在指定日期范围内,按照session粒度进行数据聚合。要求聚合后的pair RDD的元素是<k:String,v:String>, 其中k=sessionid,v的格式如下:

sessionid=value|searchword=value|clickcaterory=value|age=value|professional=value|city=value|sex=value

使用(Spark RDD + Sql)

需求3

根据用户的查询条件,一个 或者多个:年龄范围,职业(多选),城市(多选),搜索词(多选),点击品类(多选)进行数据过滤

注意:session时间范围是必选的。返回的结果RDD元素格式同上,使用(Spark RDD + Sql)

需求4

实现自定义累加器完成多个聚合统计业务的计算,统计业务包括访问时长:1-3秒,4-6秒,7-9秒,10-30秒,30-60秒的session访问量统计, 访问步长:1-3个页面,4-6个页面等步长的访问统计

注意:业务较为复杂,需要使用多个广播变量时,就会使得程序变得非常复杂,不便于扩展维护(Spark Accumulator)

需求5

对通过筛选条件的session,按照各个品类的点击、下单和支付次数,降序排列,获取前10个热门品类。

优先级:点击,下单,支付。二次排序(Spark)

其它

以下链接是一个从前端展示到后台数据交互流程的具体demo

WiFiProbeAnalysis

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].