All Projects → dwave-examples → clustering

dwave-examples / clustering

Licence: Apache-2.0 license
Using a quantum computer to cluster data points

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to clustering

nurse-scheduling
A demo of a nurse scheduling model
Stars: ✭ 24 (+20%)
Mutual labels:  constraint-satisfaction-problem, intermediate, bqm
satellite-placement
Group satellites into constellations such that their average observation coverage is maximized
Stars: ✭ 20 (+0%)
Mutual labels:  constraint-satisfaction-problem, intermediate, bqm
circuit-fault-diagnosis
Find possible failing components on a circuit.
Stars: ✭ 18 (-10%)
Mutual labels:  constraint-satisfaction-problem, bqm
sudoku
Solve a Sudoku puzzle with a quantum computer
Stars: ✭ 20 (+0%)
Mutual labels:  constraint-satisfaction-problem, intermediate
job-shop-scheduling
Determine a schedule for running a set of jobs.
Stars: ✭ 34 (+70%)
Mutual labels:  constraint-satisfaction-problem, bqm
code-review
Um projeto onde você pode enviar seu código fonte para outras pessoas te ajudarem a melhorar
Stars: ✭ 84 (+320%)
Mutual labels:  intermediate
Decider
An Open Source .Net Constraint Programming Solver
Stars: ✭ 112 (+460%)
Mutual labels:  constraint-satisfaction-problem
structural-imbalance
Demo for analyzing the structural imbalance on a signed social network.
Stars: ✭ 22 (+10%)
Mutual labels:  intermediate
erlmud
Evolutionary demonstration of Erlang/OTP development.
Stars: ✭ 33 (+65%)
Mutual labels:  intermediate
ordered
Entropy-controlled contexts in Python
Stars: ✭ 36 (+80%)
Mutual labels:  constraint-satisfaction-problem
Optaplanner
AI constraint solver in Java to optimize the vehicle routing problem, employee rostering, task assignment, maintenance scheduling, conference scheduling and other planning problems.
Stars: ✭ 2,454 (+12170%)
Mutual labels:  constraint-satisfaction-problem
GHOST
General meta-Heuristic Optimization Solving Toolkit
Stars: ✭ 28 (+40%)
Mutual labels:  constraint-satisfaction-problem
Learn Vim
Learning Vim and Vimscript doesn't have to be hard. This is the guide that you're looking for.
Stars: ✭ 7,221 (+36005%)
Mutual labels:  intermediate
knapsack
Implementation of knapsack problem, set up for scaling to large problem size
Stars: ✭ 17 (-15%)
Mutual labels:  constraint-satisfaction-problem
named-entity-recognition
Notebooks for teaching Named Entity Recognition at the Cultural Heritage Data School, run by Cambridge Digital Humanities
Stars: ✭ 18 (-10%)
Mutual labels:  intermediate
pycsp3
A Python Library for modeling combinatorial constrained problems
Stars: ✭ 39 (+95%)
Mutual labels:  constraint-satisfaction-problem
generator-bunny
🐰 Jumpstart node module, like a bunny!
Stars: ✭ 13 (-35%)
Mutual labels:  intermediate
Stanford Cs 221 Artificial Intelligence
VIP cheatsheets for Stanford's CS 221 Artificial Intelligence
Stars: ✭ 1,923 (+9515%)
Mutual labels:  constraint-satisfaction-problem
facile
Python constraint programming library
Stars: ✭ 21 (+5%)
Mutual labels:  constraint-satisfaction-problem
factoring
Factor numbers using a quantum computer
Stars: ✭ 30 (+50%)
Mutual labels:  constraint-satisfaction-problem

Linux/Mac/Windows build status

Clustering

A demo on identifying clusters in a data set using the D-Wave quantum computer.

When dealing with large data sets, we do not always have neatly labeled data (i.e. most, if not all, the data could be unlabeled). However, all is not lost as we can still extract insights from this data through clustering.

For example, when dealing with housing data---namely, square footage and price---we can identify clusters in that data which may be indicative of different neighborhoods. Another example could be having a boolean vector of TV shows that consumers watch; clusters in this data could help identify a particular consumer demographic.

As well, if we do have a few labelled data points in our data set, we could potentially label the entire cluster based on these.

Clustered Plot

Figure: A data set of nine points that have been divided into three clusters (shown with different colors).

Usage

To run the demo with a simple, hardcoded data set:

python clustering.py

To run the same demo with a slightly more sophisticated data set:

python example_clusters.py

This provides a visualization of the problem on the D-Wave problem inspector, and plots of the original data set and its solution.

Code Overview

The D-Wave quantum computer accepts problems formulated mathematically in Binary Quadratic Model (BQM) format. The goal here is to build a BQM such that it represents our clustering problem. Namely, we want a BQM such that a low-energy solution found by the D-Wave quantum computer would correspond to a solution to our clustering problem.

Key properties of the clustering problem that we need to capture in our BQM:

  • Each data point can only be a part of one cluster
  • Data points that are close together should be a part of the same cluster
  • Data points that are far apart should be in different clusters

Code Specifics

Let's go through how we implement each of the key properties of our clustering problem.

Each data point can only join one cluster

  • The code is only considering three different cluster labels: red, green, and blue.

  • Since a qubit can only end up in one of two states (i.e. it can only answer yes or no questions), each data point has three nodes associated to it: <coordinate>.r, <coordinate>.g, and <coordinate>.b. That way, we can answer yes-no questions for whether a specific coordinate is in a particular colour cluster.

  • The rule that a data point may only join one cluster is represented by the variable choose_one_group (shown below). Each three-item-tuple below can be interpreted as (<join-red>, <join-green>, <join-blue>), where the 1s and 0s indicate true and false, respectively. Hence, the choose_one_group is a set of all valid states. (e.g. (1, 1, 0) is not valid because a data point is not allowed to be in both red and green clusters at once).

    choose_one_group = {(0, 0, 1), (0, 1, 0), (1, 0, 0)}
  • You can easily see this "choose one group" constraint in the D-Wave inspector. The image below has four data points (as seen in clustering.py). Each of the four data points has three nodes associated with it - <coordinate>.r, <coordinate.g>, and <coordinate.b> - hence the twelve nodes below.

  • A particular data point's set of three nodes can actually be identified in the graph below. By noting the yellow triangles below, we can see that one vertex is selected (in yellow) and two vertices are not selected (in white).

Logical Graph

  • In the actual D-Wave Inspector, you can hover over these nodes and find out which data point and colour they represent and the BQM weights that are placed on them.

Close data points should be in the same cluster

  • If we set BQM weights such that clustering close data points has lower-energy solutions, when the quantum computer minimizes the BQM, it finds good solutions to the clustering problem.

  • These weights are dependent on distance. In order to keep the weights within a reasonable range, the distances are all scaled with respect to the max_distance, the largest distance between any two points in the data set.

  • Below is the function used to determine the weight to encourage close together points to be in the same cluster

    d = get_distance(coord0, coord1) / max_distance  # rescale distance
    weight = -math.cos(d*math.pi)
  • We can apply many different types of functions for generating the weight. In this case, we chose a cosine function. The main idea is that we simply need short distances (nearby points) to generate a strong negative value that contributes to clustering these points, while points with large distances are only mildly affected.

Far-apart data points should be in different clusters

  • Here, we want to encourage far apart data points to be in different clusters. Again, since the D-Wave quantum computer seeks low-energy configurations, we need to make far apart data points correspond to lower energy.

  • We do this by choosing a strong negative weight for far apart points. Hence, the choice of the tanh function.

    d = math.sqrt(get_distance(coord0, coord1) / max_distance)
    weight = -math.tanh(d) * 0.1
  • Note that a scalar of 0.1 was applied in order to prevent this weight from overwhelming the other weights in the BQM. The 0.1 is arbitrary and was found by tinkering with the code.

License

Released under the Apache License 2.0. See LICENSE file.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].