This project is to convert PortraitFCN+ (by Xiaoyong Shen) from Matlab to Tensorflow, then refine the outputs from it (converted to a trimap) using KNN and ResNet, supervised by Richard Berwick.

Stars: ✭ 61 (-16.44%)

Mutual labels: knn

lshensemble

LSH index for approximate set containment search

Stars: ✭ 48 (-34.25%)

Mutual labels: approximate-nearest-neighbor-search

elasticsearch-approximate-nearest-neighbor

Plugin to integrate approximate nearest neighbor(ANN) search with Elasticsearch

Stars: ✭ 53 (-27.4%)

Mutual labels: approximate-nearest-neighbor-search

Annoy

Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

Stars: ✭ 9,262 (+12587.67%)

Mutual labels: approximate-nearest-neighbor-search

keras-knn

Code for the blog post Nearest Neighbors with Keras and CoreML

Stars: ✭ 25 (-65.75%)

Mutual labels: knn

paccmann kinase binding residues

Comparison of active site and full kinase sequences for drug-target affinity prediction and molecular generation. Full paper: https://pubs.acs.org/doi/10.1021/acs.jcim.1c00889

Stars: ✭ 29 (-60.27%)

Mutual labels: knn

adventures-with-ann

All the code for a series of Medium articles on Approximate Nearest Neighbors

Stars: ✭ 40 (-45.21%)

Mutual labels: approximate-nearest-neighbor-search

scikit-hubness

A Python package for hubness analysis and high-dimensional data mining

Stars: ✭ 41 (-43.84%)

Mutual labels: approximate-nearest-neighbor-search

product-quantization

🙃Implementation of vector quantization algorithms, codes for Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search.

Stars: ✭ 40 (-45.21%)

Mutual labels: approximate-nearest-neighbor-search

NearestNeighborDescent.jl

Efficient approximate k-nearest neighbors graph construction and search in Julia

Stars: ✭ 34 (-53.42%)

Mutual labels: approximate-nearest-neighbor-search

drowsiness-detection

To identify the driver's drowsiness based on real-time camera image and image processing techniques. 졸음운전 감지 시스템. OpenCV

Stars: ✭ 31 (-57.53%)

Mutual labels: knn

gongt

NGT Go client library

Stars: ✭ 29 (-60.27%)

Mutual labels: approximate-nearest-neighbor-search

Trajectory-Analysis-and-Classification-in-Python-Pandas-and-Scikit-Learn

Formed trajectories of sets of points.Experimented on finding similarities between trajectories based on DTW (Dynamic Time Warping) and LCSS (Longest Common SubSequence) algorithms.Modeled trajectories as strings based on a Grid representation.Benchmarked KNN, Random Forest, Logistic Regression classification algorithms to classify efficiently t…

Stars: ✭ 41 (-43.84%)

Mutual labels: knn

Machine Learning

⚡机器学习实战（Python3）：kNN、决策树、贝叶斯、逻辑回归、SVM、线性回归、树回归

Stars: ✭ 5,601 (+7572.6%)

Mutual labels: knn

Milvus

An open-source vector database for embedding similarity search and AI applications.

Stars: ✭ 9,015 (+12249.32%)

Mutual labels: approximate-nearest-neighbor-search

instant-distance

Fast approximate nearest neighbor searching in Rust, based on HNSW index

Stars: ✭ 140 (+91.78%)

Mutual labels: approximate-nearest-neighbor-search

Recommender-Systems

Implementing Content based and Collaborative filtering(with KNN, Matrix Factorization and Neural Networks) in Python

Stars: ✭ 46 (-36.99%)

Mutual labels: knn

pqlite

⚡ A fast embedded library for approximate nearest neighbor search

Stars: ✭ 141 (+93.15%)

Mutual labels: approximate-nearest-neighbor-search

View All Similar Projects ➔

spark-annoy (WIP)

Building Annoy Index on Apache Spark. Then query neighbors using Annoy.

Note

I had built an index of 117M 64-dimensional vectors using 100 nodes in 5 minutes. The settings was;

// version: 0.1.4
// spark.executor.instances = 100
// spark.executor.memory = 8g
// spark.driver.memory = 8g
val fraction = 0.00086 // for about 100k samples
val numTrees = 2
val numPartitions = 100
val annoyModel = new Annoy().setFraction(fraction).setNumTrees(numTrees).fit(dataset)
annoyModel.saveAsAnnoyBinary("/hdfs/path/to/index", numPartitions)

The size of the index is about 33G.

Distributed Builds

import spark.implicits._

val data = spark.read.textFile("data/annoy/sample-glove-25-angular.txt")
  .map { str =>
    val Array(id, features) = str.split("\t")
    (id.toInt, features.split(",").map(_.toFloat))
  }
  .toDF("id", "features")

val ann = new Annoy()
  .setNumTrees(2)

val annModel = ann.fit(data)

annModel.saveAsAnnoyBinary("/path/to/dump/annoy-binary")

Dependency

From the version 0.1.2, it is released to Maven.

libraryDependencies += "com.github.mskimm" %% "ann4s" % "0.1.5"

0.1.5 is built with Apache Spark 2.3.0

How does it work?

builds a parent tree using sampled data on Spark Master
all data are grouped by the leaf node of parent tree on Spark Nodes
builds subtree using the grouped data on each Spark Nodes
aggregate all nodes of subtree to parent tree on Spark Master

Use Case

Index ALS User/Item Factors

src/test/scala/ann4s/spark/example/ALSBasedUserItemIndexing.scala

...
val training: DataFrame = _
val als = new ALS()
  .setMaxIter(5)
  .setRegParam(0.01)
  .setUserCol("userId")
  .setItemCol("movieId")
  .setRatingCol("rating")

val model = als.fit(training)

val ann = new Annoy()
  .setNumTrees(2)
  .setFraction(0.1)
  .setIdCol("id")
  .setFeaturesCol("features")

val userAnnModel= ann.fit(model.userFactors)
userAnnModel.writeAnnoyBinary("exp/als/user_factors.ann")

val itemAnnModel = ann.fit(model.itemFactors)
itemAnnModel.writeAnnoyBinary("exp/als/item_factors.ann")
...

Comment

I personally started this project to study Scala. I found out that Annoy is a fairly good library for nearest neighbors search and can be implemented distributed version using Apache Spark. Recently, various bindings and implementations have been actively developed. In particular, the purpose and usability of this project overlap with some projects like annoy4s and annoy-java in terms of running on JVM.

To continue contribution, from now on this project focuses on building Index on Apache Spark for distributed builds. This will support building using 1 billion or more items and writing Annoy compatible binary.

References

https://github.com/spotify/annoy : native implementation with serveral bindings like Python
https://github.com/pishen/annoy4s : Scala wrapper using JNA
https://github.com/spotify/annoy-java : Java implementation

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

mskimm / spark-annoy

Programming Languages

Labels

Projects that are alternatives of or similar to spark-annoy

spark-annoy (WIP)

Note

Distributed Builds

Dependency

How does it work?

Use Case

Index ALS User/Item Factors

Comment

References