Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → SSQ → Coursera Uw Machine Learning Clustering Retrieval

SSQ / Coursera Uw Machine Learning Clustering Retrieval

Licence: mit

Programming Languages

python

139335 projects - #7 most used programming language

Labels

mapreduce tf-idf

Projects that are alternatives of or similar to Coursera Uw Machine Learning Clustering Retrieval

Textmining

Python文本挖掘系统 Research of Text Mining System

Stars: ✭ 268 (+972%)

Mutual labels: tf-idf

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+88092%)

Mutual labels: mapreduce

Corral

🐎 A serverless MapReduce framework written for AWS Lambda

Stars: ✭ 648 (+2492%)

Mutual labels: mapreduce

2018 Machinelearning Lectures Esa

Machine Learning Lectures at the European Space Agency (ESA) in 2018

Stars: ✭ 280 (+1020%)

Mutual labels: tf-idf

Nlp

Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang

Stars: ✭ 304 (+1116%)

Mutual labels: tf-idf

Bigslice

A serverless cluster computing system for the Go programming language

Stars: ✭ 469 (+1776%)

Mutual labels: mapreduce

Guitar

A Simple and Efficient Distributed Multidimensional BI Analysis Engine.

Stars: ✭ 86 (+244%)

Mutual labels: mapreduce

Yandex Big Data Engineering

Stars: ✭ 17 (-32%)

Mutual labels: mapreduce

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

Stars: ✭ 318 (+1172%)

Mutual labels: mapreduce

Cdap

An open source framework for building data analytic applications.

Stars: ✭ 509 (+1936%)

Mutual labels: mapreduce

Redisson

Redisson - Redis Java client with features of In-Memory Data Grid. Over 50 Redis based Java objects and services: Set, Multimap, SortedSet, Map, List, Queue, Deque, Semaphore, Lock, AtomicLong, Map Reduce, Publish / Subscribe, Bloom filter, Spring Cache, Tomcat, Scheduler, JCache API, Hibernate, MyBatis, RPC, local cache ...

Stars: ✭ 17,972 (+71788%)

Mutual labels: mapreduce

Polyfuzz

Fuzzy string matching, grouping, and evaluation.

Stars: ✭ 292 (+1068%)

Mutual labels: tf-idf

Bigdata

💎🔥大数据学习笔记

Stars: ✭ 488 (+1852%)

Mutual labels: mapreduce

Tdigest

t-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark

Stars: ✭ 274 (+996%)

Mutual labels: mapreduce

Distributed Computing

distributed_computing include mapreduce kvstore etc.

Stars: ✭ 654 (+2516%)

Mutual labels: mapreduce

data-algorithms-with-spark

O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

Stars: ✭ 34 (+36%)

Mutual labels: mapreduce

Bdp Dataplatform

大数据生态解决方案数据平台：基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。

Stars: ✭ 456 (+1724%)

Mutual labels: mapreduce

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+3616%)

Mutual labels: mapreduce

Nlp In Practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.

Stars: ✭ 790 (+3060%)

Mutual labels: tf-idf

Moviebox

Machine learning movie recommending system

Stars: ✭ 504 (+1916%)

Mutual labels: tf-idf

View All Similar Projects ➔

Coursera UW Machine Learning Clustering & Retrieval

Course can be found in Coursera

Notebook for quick search can be found in my blog SSQ

Videos in Bilibili(to which I post it)

Week 1 Intro
Week 2 Nearest Neighbor Search: Retrieving Documents
- Implement nearest neighbor search for retrieval tasks
- Contrast document representations (e.g., raw word counts, tf-idf,…)
  - Emphasize important words using tf-idf
- Contrast methods for measuring similarity between two documents
  - Euclidean vs. weighted Euclidean
  - Cosine similarity vs. similarity via unnormalized inner product
- Describe complexity of brute force search
- Implement KD-trees for nearest neighbor search
- Implement LSH for approximate nearest neighbor search
- Compare pros and cons of KD-trees and LSH, and decide which is more appropriate for given dataset
- [x] Choosing features and metrics for nearest neighbor search
- [x] Implementing Locality Sensitive Hashing from scratch
Week 3 Clustering with k-means
- Describe potential applications of clustering
- Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
- Determine whether a task is supervised or unsupervised
- Cluster documents using k-means
- Interpret k-means as a coordinate descent algorithm
- Define data parallel problems
- Explain Map and Reduce steps of MapReduce framework
- Use existing MapReduce implementations to parallelize kmeans, understanding what’s being done under the hood
- [x] Clustering text data with k-means
Week 4 Mixture Models: Model-Based Clustering
- Interpret a probabilistic model-based approach to clustering using mixture models
- Describe model parameters
- Motivate the utility of soft assignments and describe what they represent
- Discuss issues related to how the number of parameters grow with the number of dimensions
  - Interpret diagonal covariance versions of mixtures of Gaussians
- Compare and contrast mixtures of Gaussians and k-means
- Implement an EM algorithm for inferring soft assignments and cluster parameters
  - Determine an initialization strategy
  - Implement a variant that helps avoid overfitting issues
- [x] Implementing EM for Gaussian mixtures
- [x] Clustering text data with Gaussian mixtures
Week 5 Latent Dirichlet Allocation: Mixed Membership Modeling
- Compare and contrast clustering and mixed membership models
- Describe a document clustering model for the bagof-words doc representation
- Interpret the components of the LDA mixed membership model
- Analyze a learned LDA model
  - Topics in the corpus
  - Topics per document
- Describe Gibbs sampling steps at a high level
- Utilize Gibbs sampling output to form predictions or estimate model parameters
- Implement collapsed Gibbs sampling for LDA
- [x] Modeling text topics with Latent Dirichlet Allocation
Week 6 Hierarchical Clustering & Closing Remarks
- Bonus content: Hierarchical clustering
  - Divisive clustering
  - Agglomerative clustering
    - The dendrogram for agglomerative clustering
    - Agglomerative clustering details
- Hidden Markov models (HMMs): Another notion of “clustering”
- [x] Modeling text data with a hierarchy of clusters

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 25

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗