All Projects → huangyueranbbc → Spark_ALS

huangyueranbbc / Spark_ALS

Licence: MIT license
基于spark-ml,spark-mllib,spark-streaming的推荐算法实现

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Spark ALS

Movie recommend
基于Spark的电影推荐系统,包含爬虫项目、web网站、后台管理系统以及spark推荐系统
Stars: ✭ 2,092 (+2250.56%)
Mutual labels:  spark-streaming, spark-mllib
bigdatatutorial
bigdatatutorial
Stars: ✭ 34 (-61.8%)
Mutual labels:  spark-streaming
Kinesis Sql
Kinesis Connector for Structured Streaming
Stars: ✭ 120 (+34.83%)
Mutual labels:  spark-streaming
Bigdata Playground
A complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+98.88%)
Mutual labels:  spark-streaming
Spark
.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+1833.71%)
Mutual labels:  spark-streaming
Registry
Schema Registry
Stars: ✭ 184 (+106.74%)
Mutual labels:  spark-streaming
Waterdrop
Production Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+1985.39%)
Mutual labels:  spark-streaming
Real-time-log-analysis-system
🐧基于spark streaming+flume+kafka+hbase的实时日志处理分析系统(分为控制台版本和基于springboot、Echarts等的Web UI可视化版本)
Stars: ✭ 31 (-65.17%)
Mutual labels:  spark-streaming
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+177.53%)
Mutual labels:  spark-streaming
Scramjet
Simple yet powerful live data computation framework
Stars: ✭ 171 (+92.13%)
Mutual labels:  spark-streaming
Azure Event Hubs Spark
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+57.3%)
Mutual labels:  spark-streaming
Streamline
StreamLine - Streaming Analytics
Stars: ✭ 151 (+69.66%)
Mutual labels:  spark-streaming
Example Spark
Spark, Spark Streaming and Spark SQL unit testing strategies
Stars: ✭ 205 (+130.34%)
Mutual labels:  spark-streaming
Example Spark Kafka
Apache Spark and Apache Kafka integration example
Stars: ✭ 120 (+34.83%)
Mutual labels:  spark-streaming
ExDeMon
A general purpose metrics monitor implemented with Apache Spark. Kafka source, Elastic sink, aggregate metrics, different analysis, notifications, actions, live configuration update, missing metrics, ...
Stars: ✭ 19 (-78.65%)
Mutual labels:  spark-streaming
Spark Mllib Twitter Sentiment Analysis
🌟 ✨ Analyze and visualize Twitter Sentiment on a world map using Spark MLlib
Stars: ✭ 113 (+26.97%)
Mutual labels:  spark-streaming
Spark Streaming With Kafka
Self-contained examples of Apache Spark streaming integrated with Apache Kafka.
Stars: ✭ 180 (+102.25%)
Mutual labels:  spark-streaming
Machine-Learning
Examples of all Machine Learning Algorithm in Apache Spark
Stars: ✭ 15 (-83.15%)
Mutual labels:  spark-mllib
fdp-modelserver
An umbrella project for multiple implementations of model serving
Stars: ✭ 47 (-47.19%)
Mutual labels:  spark-streaming
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+142.7%)
Mutual labels:  spark-streaming

Spark-ALS Travis Travis Travis Travis

简介

ALS是alternating least squares的缩写 , 意为交替最小二乘法;而ALS-WR是alternating-least-squares with weighted-λ -regularization的缩写,意为加权正则化交替最小二乘法。该方法常用于基于矩阵分解的推荐系统中。例如:将用户(user)对商品(item)的评分矩阵分解为两个矩阵:一个是用户对商品隐含特征的偏好矩阵,另一个是商品所包含的隐含特征的矩阵。在这个矩阵分解的过程中,评分缺失项得到了填充,也就是说我们可以基于这个填充的评分来给用户最商品推荐了。
ALS is the abbreviation of squares alternating least, meaning the alternating least squares method; and the ALS-WR is alternating-least-squares with weighted- lambda -regularization acronym, meaning weighted regularized alternating least squares method. This method is often used in recommender systems based on matrix factorization. For example, the user (user) score matrix of item is decomposed into two matrices: one is the user preference matrix for the implicit features of the commodity, and the other is the matrix of the implied features of the commodity. In the process of decomposing the matrix, the score missing is filled, that is, we can give the user the most recommended commodity based on the filled score.

ALS-WR算法,简单地说就是:
(数据格式为:userId, itemId, rating, timestamp )
1 对每个userId随机初始化N(10)个factor值,由这些值影响userId的权重。
2 对每个itemId也随机初始化N(10)个factor值。
3 固定userId,从userFactors矩阵和rating矩阵中分解出itemFactors矩阵。即[Item Factors Matrix] = [User Factors Matrix]^-1 * [Rating Matrix].
4 固定itemId,从itemFactors矩阵和rating矩阵中分解出userFactors矩阵。即[User Factors Matrix] = [Item Factors Matrix]^-1 * [Rating Matrix].
5 重复迭代第3,第4步,最后可以收敛到稳定的userFactors和itemFactors。
6 对itemId进行推断就为userFactors * itemId = rating value;对userId进行推断就为itemFactors * userId = rating value。

#SparkALSByStreaming.java
基于Hadoop、Flume、Kafka、spark-streaming、logback、商城系统的实时推荐系统DEMO
Real time recommendation system DEMO based on Hadoop, Flume, Kafka, spark-streaming, logback and mall system
商城系统采集的数据集格式 Data Format:
用户ID,商品ID,用户行为评分,时间戳
UserID,ItemId,Rating,TimeStamp
53,1286513,9,1508221762
53,1172348420,9,1508221762
53,1179495514,12,1508221762
53,1184890730,3,1508221762
53,1210793742,159,1508221762
53,1215837445,9,1508221762

Kafka Command:

hadoop dfs -mkdir /spark-als/model

hadoop dfs -mkdir /flume/logs

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic RECOMMEND_TOPIC

kafka-console-producer.sh --broker-list 192.168.0.193:9092 --topic RECOMMEND_TOPIC < /data/streaming_sample_movielens_ratings.txt

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].