Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → jadianes → Spark Movie Lens

jadianes / Spark Movie Lens

Licence: other

An on-line movie recommender using Spark, Python Flask, and the MovieLens dataset

Programming Languages

python

139335 projects - #7 most used programming language

Labels

jupyter-notebook flask spark big-data bigdata

Projects that are alternatives of or similar to Spark Movie Lens

Big Data Engineering Coursera Yandex

Big Data for Data Engineers Coursera Specialization from Yandex

Stars: ✭ 71 (-90.47%)

Mutual labels: jupyter-notebook, spark, big-data, bigdata

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+79.6%)

Mutual labels: jupyter-notebook, spark, big-data, bigdata

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

Stars: ✭ 986 (+32.35%)

Mutual labels: jupyter-notebook, spark, bigdata

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (-42.82%)

Mutual labels: jupyter-notebook, big-data, bigdata

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (-71.14%)

Mutual labels: spark, big-data, bigdata

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-98.26%)

Mutual labels: big-data, spark, bigdata

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+1375.3%)

Mutual labels: spark, big-data, bigdata

Spark R Notebooks

R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 109 (-85.37%)

Mutual labels: jupyter-notebook, big-data, bigdata

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (-79.87%)

Mutual labels: jupyter-notebook, spark, big-data

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+659.19%)

Mutual labels: jupyter-notebook, spark, big-data

Enterprise gateway

A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.

Stars: ✭ 412 (-44.7%)

Mutual labels: jupyter-notebook, spark

Opendata.cern.ch

Source code for the CERN Open Data portal

Stars: ✭ 411 (-44.83%)

Mutual labels: big-data, flask

Big data architect skills

一个大数据架构师应该掌握的技能

Stars: ✭ 400 (-46.31%)

Mutual labels: spark, bigdata

Pytorch classification

利用pytorch实现图像分类的一个完整的代码，训练，预测，TTA，模型融合，模型部署，cnn提取特征，svm或者随机森林等进行分类，模型蒸馏，一个完整的代码

Stars: ✭ 395 (-46.98%)

Mutual labels: jupyter-notebook, flask

Agile data code 2

Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition

Stars: ✭ 413 (-44.56%)

Mutual labels: jupyter-notebook, spark

Listenbrainz Server

Server for the ListenBrainz project

Stars: ✭ 420 (-43.62%)

Mutual labels: spark, big-data

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+706.44%)

Mutual labels: spark, bigdata

Bigdataie

大数据博客、笔试题、教程、项目、面经的整理

Stars: ✭ 445 (-40.27%)

Mutual labels: spark, bigdata

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+411.81%)

Mutual labels: spark, big-data

Circosjs

d3 library to build circular graphs

Stars: ✭ 436 (-41.48%)

Mutual labels: big-data, bigdata

View All Similar Projects ➔

A scalable on-line movie recommender using Spark and Flask

This Apache Spark tutorial will guide you step-by-step into how to use the MovieLens dataset to build a movie recommender using collaborative filtering with Spark's Alternating Least Saqures implementation. It is organised in two parts. The first one is about getting and parsing movies and ratings data into Spark RDDs. The second is about building and using the recommender and persisting it for later use in our on-line recommender system.

This tutorial can be used independently to build a movie recommender model based on the MovieLens dataset. Most of the code in the first part, about how to use ALS with the public MovieLens dataset, comes from my solution to one of the exercises proposed in the CS100.1x Introduction to Big Data with Apache Spark by Anthony D. Joseph on edX, that is also publicly available since 2014 at Spark Summit. Starting from there, I've added with minor modifications to use a larger dataset, then code about how to store and reload the model for later use, and finally a web service using Flask.

In any case, the use of this algorithm with this dataset is not new (you can Google about it), and this is because we put the emphasis on ending up with a usable model in an on-line environment, and how to use it in different situations. But I truly got inspired by solving the exercise proposed in that course, and I highly recommend you to take it. There you will learn not just ALS but many other Spark algorithms.

It is the second part of the tutorial the one that explains how to use Python/Flask for building a web-service on top of Spark models. By doing so, you will be able to develop a complete on-line movie recommendation service.

Part I: Building the recommender

Part II: Building and running the web service

Quick start

The file server/server.py starts a CherryPy server running a Flask app.py to start a RESTful web server wrapping a Spark-based engine.py context. Through its API we can perform on-line movie recommendations.

Please, refer the the second notebook for detailed instructions on how to run and use the service.

Contributing

Contributions are welcome! For bug reports or requests please submit an issue.

Contact

Feel free to contact me to discuss any issues, questions, or comments.

Twitter: @ja_dianes
GitHub: jadianes
LinkedIn: jadianes
Website: jadianes.me

License

This repository contains a variety of content; some developed by Jose A. Dianes, and some from third-parties. The third-party content is distributed under the license provided by those parties.

The content developed by Jose A. Dianes is distributed under the following license:

Copyright 2016 Jose A Dianes

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 745

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (10) 🔗