Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → wsuen → Pygotham2018_graphmining

wsuen / Pygotham2018_graphmining

Large-scale Graph Mining with Spark

Labels

jupyter-notebook

Projects that are alternatives of or similar to Pygotham2018 graphmining

Tech Terms

A repository of technical terms and definitions. As flashcards.

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Udacity Ml Nanodegree

Projects for Udacity's Machine Learning Engineer Nanodegree

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Hacktoberfest2020

beginner-friendly project to help you in open-source contributions. Made specifically for contributions in HACKTOBERFEST 2020! Hello World Programs in any language and C and Cpp program , Please leave a star ⭐ to support this project! ✨

Stars: ✭ 31 (+0%)

Mutual labels: jupyter-notebook

Poi2vec

POI2Vec: Geographical Latent Representation for Predicting Future Visitors

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Machine Learning Alpine

Alpine Container for Machine Learning

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Machine Learning

Machine learning for Project Cognoma

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Pytorch Course

JULYEDU PyTorch Course

Stars: ✭ 947 (+2954.84%)

Mutual labels: jupyter-notebook

Crnn Pytorch

✍️ Convolutional Recurrent Neural Network in Pytorch | Text Recognition

Stars: ✭ 31 (+0%)

Mutual labels: jupyter-notebook

Qa Rankit

QA - Answer Selection (Rank candidate answers for a given question)

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Ijulia Notebooks

My IJulia notebooks

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Datahacksummit 2017

Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Docker Iocaml Datascience

Dockerfile of Jupyter (IPython notebook) and IOCaml (OCaml kernel) with libraries for data science and machine learning

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Functional Python

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Quantumcircuitbornmachine

gradient based training of Quantum Circuit Born Machine

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Mathematical And Statistical Modeling Of Covid19 In Brazil

To make a library of models that aim to understand the spread of COVID19 in adequate scenarios of the Brazilian population

Stars: ✭ 31 (+0%)

Mutual labels: jupyter-notebook

Sparkmagic

Jupyter magics and kernels for working with remote Spark clusters

Stars: ✭ 954 (+2977.42%)

Mutual labels: jupyter-notebook

Bdr Analytics Py

Common data science and data engineering utilities to help us perform analytics. Our toolbox for data scientists, licensed under Apache-2.0

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

Signdetect Face

Stars: ✭ 31 (+0%)

Mutual labels: jupyter-notebook

Learn Quantum Computing With Python And Ibm Quantum Experience

Learn Quantum Computing with Python and IBM Quantum Experience, published by Packt

Stars: ✭ 31 (+0%)

Mutual labels: jupyter-notebook

Udacity machine learning engineer

Udacity Machine Learning Engineer Nanodegree

Stars: ✭ 30 (-3.23%)

Mutual labels: jupyter-notebook

View All Similar Projects ➔

Large Scale Graph Mining with Spark

PyGotham 2018 talk. See also my tutorial on Medium.

Getting started

This repo includes Dockerfile for running a Jupyter notebook with pyspark.

Running the notebook

Make sure you have Docker installed.
Run make build to create your Docker image. This may take a while.
Run make run_notebook_volume. This starts a Docker container with a volume containing the notebooks and sample dataset
Go to 127.0.0.0:8888 to see the notebook server. You may need to enter authentication token, which will be somewhere in your terminal output.
Open work/notebooks/Graphframes_demo.

Stopping Jupyter notebook

Find Docker process with docker ps.
Kill container with docker kill <container_id>.

About the sample dataset

I also included a small sample dataset that I created from the Common Crawl September 2017 dataset. The data, stored in a parquet file under notebooks/data/outlinks_pq, has the following format:

parent: full URL of parent node, the html I pulled links from.
parentTLD: top level domain of parent
childTLD: top level domain of child
child: full url of child node, the link found on the parent web page.

Hopefully this will jumpstart your exploration of web graphs, LPA, PageRank, and other cool features!

References

Adamic, Lada A., and Natalie Glance. "The political blogosphere and the 2004 US election: divided they blog." Proceedings of the 3rd international workshop on Link discovery. ACM, 2005.

Common Crawl dataset (September 2017).

Farine, Damien R., et al. "Both nearest neighbours and long-term affiliates predict individual locations during collective movement in wild baboons." Scientific reports 6 (2016): 27704

Fortunato, Santo. "Community detection in graphs." Physics reports 486.3-5 (2010): 75-174.

Girvan, Michelle, and Mark EJ Newman. “Community structure in social and biological networks.” Proceedings of the national academy of sciences 99.12 (2002): 7821–7826.

Leskovec, Jure, Anand Rajaraman, and Jeffrey David Ullman. Mining of massive datasets. Cambridge University Press, 2014.

Raghavan, Usha Nandini, Réka Albert, and Soundar Kumara. "Near linear time algorithm to detect community structures in large-scale networks." Physical review E 76.3 (2007): 036106.

Zachary karate club network dataset -- KONECT, April 2017.

Additional Resources

Spark

I like Learning Spark by Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia.
Also High Performance Spark by Holden Karau and Rachel Warren.

GraphFrames

Spark GraphFrames documentation.
Databricks blog post about GraphFrames.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 31

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗