Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → dblalock → Bolt

dblalock / Bolt

Licence: mpl-2.0

Fast approximate vector operations

Labels

machine-learning database compression data-mining

Projects that are alternatives of or similar to Bolt

Pgm Index

🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

Stars: ✭ 499 (+612.86%)

Mutual labels: database, compression

Game Datasets

🎮 A curated list of awesome game datasets, and tools to artificial intelligence in games

Stars: ✭ 261 (+272.86%)

Mutual labels: data-mining, database

Pygm

🐍 Python library implementing sorted containers with state-of-the-art query performance and compressed memory usage

Stars: ✭ 156 (+122.86%)

Mutual labels: database, compression

Linkedingiveaway

👨🏽‍🏫You can learn about anything over here. What Giveaways I do and why it's important in today's modern world. Are you interested in Giveaway's?🔋

Stars: ✭ 67 (-4.29%)

Mutual labels: data-mining, database

Zabbixdba

Zabbix Database Monitoring Service (Oracle, Pg, MySQL, MS SQL, DB2, etc.)

Stars: ✭ 68 (-2.86%)

Mutual labels: database

Tlddatabase

[DEPRECATED] Abstraction layer for Public Suffix List in PHP

Stars: ✭ 67 (-4.29%)

Mutual labels: database

Gorse

An open source recommender system service written in Go

Stars: ✭ 1,148 (+1540%)

Mutual labels: data-mining

Oblecto

Oblecto is a media server, which streams media you already own, and is designed to be at the heart of your entertainment experience. It runs on your home server to index and analyze your media such as Movies and TV Shows and presents them in an interface tailored for your media consupmtion needs.

Stars: ✭ 67 (-4.29%)

Mutual labels: database

Keydb

high performance key value database written in Go

Stars: ✭ 70 (+0%)

Mutual labels: database

Ffbe

Datamining for FFBE GL

Stars: ✭ 69 (-1.43%)

Mutual labels: data-mining

Directus Docker

Directus 6 Docker — Legacy Container [EOL]

Stars: ✭ 68 (-2.86%)

Mutual labels: database

Dbmigrations

A library for the creation, management, and installation of schema updates for relational databases.

Stars: ✭ 67 (-4.29%)

Mutual labels: database

Awesome Business Intelligence

Actively curated list of awesome BI tools. PRs welcome!

Stars: ✭ 1,157 (+1552.86%)

Mutual labels: database

Books

Awesome Books

Stars: ✭ 66 (-5.71%)

Mutual labels: database

Node Blog

🔥✨ A react blog project base on nodejs, nestjs, mongoose, typescript, react, ant-design,nextjs

Stars: ✭ 69 (-1.43%)

Mutual labels: database

Covenantsql

A decentralized, trusted, high performance, SQL database with blockchain features

Stars: ✭ 1,148 (+1540%)

Mutual labels: database

Dbx

A neat codegen-based database wrapper written in Go

Stars: ✭ 68 (-2.86%)

Mutual labels: database

Scan

Scan database/sql rows directly to structs, slices, and primitive types

Stars: ✭ 69 (-1.43%)

Mutual labels: database

Etl with python

ETL with Python - Taught at DWH course 2017 (TAU)

Stars: ✭ 68 (-2.86%)

Mutual labels: database

Numcompress

Python package to compress numerical series & numpy arrays into strings

Stars: ✭ 68 (-2.86%)

Mutual labels: compression

View All Similar Projects ➔

Bolt is an algorithm for compressing vectors of real-valued data and running mathematical operations directly on the compressed representations.

If you have a large collection of mostly-dense vectors and can tolerate lossy compression, Bolt can probably save you 10-200x space and compute time.

Bolt also has theoretical guarantees bounding the errors in its approximations.

Installing

Python

  $ brew install swig  # for wrapping C++; use apt-get, yum, etc, if not OS X
  $ pip install numpy  # bolt installation needs numpy already present
  $ git clone https://github.com/dblalock/bolt.git
  $ cd bolt && python setup.py install
  $ pytest tests/  # optionally, run the tests

If you run into any problems, please don't hesitate to mention it in the Python build problems issue.

C++

Install Bazel, Google's open-source build system. Then

  $ git clone https://github.com/dblalock/bolt.git
  $ cd bolt/cpp && bazel run :main

The bazel run command will build the project and run the tests and benchmarks.

If you want to integrate Bolt with another C++ project, include cpp/src/include/public.hpp and add the remaining files under cpp/src to your builds. You should let me know if you're interested in doing such an integration because I'm hoping to see Bolt become part of many libraries and thus would be happy to help you.

Notes

Bolt currently only supports machines with AVX2 instructions, which basically means x86 machines from fall 2013 or later. Contributions for ARM support are welcome. Also note that the Bolt Python wrapper is currently configured to require Clang, since GCC apparently runs into issues.

How does it work?

Bolt is based on vector quantization. For details, see the Bolt paper or slides.

Benchmarks

Bolt includes a thorough set of speed and accuracy benchmarks. See the experiments/ directory. This is also what you want if you want to reproduce the results in the paper.

Note that all of the timing results use the raw C++ implementation. At present, the Python wrapper is slightly slower due to Python overhead. If you're interested in having a full-speed wrapper, let me know and I'll allocate time to making this happen.

Basic usage

X, queries = some N x D array, some iterable of length D arrays

# these are approximately equal (though the latter are shifted and scaled)
enc = bolt.Encoder(reduction='dot').fit(X)
[np.dot(X, q) for q in queries]
[enc.transform(q) for q in queries]

# same for these
enc = bolt.Encoder(reduction='l2').fit(X)
[np.sum((X - q) * (X - q), axis=1) for q in queries]
[enc.transform(q) for q in queries]

# but enc.transform() is 10x faster or more

Example: Matrix-vector multiplies

import bolt
import numpy as np
from scipy.stats import pearsonr as corr
from sklearn.datasets import load_digits
import timeit

# for simplicity, use the sklearn digits dataset; we'll split
# it into a matrix X and a set of queries Q
X, _ = load_digits(return_X_y=True)
nqueries = 20
X, Q = X[:-nqueries], X[-nqueries:]

enc = bolt.Encoder(reduction='dot', accuracy='lowest') # can tweak acc vs speed
enc.fit(X)

dot_corrs = np.empty(nqueries)
for i, q in enumerate(Q):
    dots_true = np.dot(X, q)
    dots_bolt = enc.transform(q)
    dot_corrs[i] = corr(dots_true, dots_bolt)[0]

# dot products closely preserved despite compression
print "dot product correlation: {} +/- {}".format(
    np.mean(dot_corrs), np.std(dot_corrs))  # > .97

# massive space savings
print(X.nbytes)  # 1777 rows * 64 cols * 8B = 909KB
print(enc.nbytes)  # 1777 * 2B = 3.55KB

# massive time savings (~10x here, but often >100x on larger
# datasets with less Python overhead; see the paper)
t_np = timeit.Timer(
    lambda: [np.dot(X, q) for q in Q]).timeit(5)        # ~9ms
t_bolt = timeit.Timer(
    lambda: [enc.transform(q) for q in Q]).timeit(5)    # ~800us
print "Numpy / BLAS time, Bolt time: {:.3f}ms, {:.3f}ms".format(
    t_np * 1000, t_bolt * 1000)

# can get output without offset/scaling if needed
dots_bolt = [enc.transform(q, unquantize=True) for q in Q]

Example: K-Nearest Neighbor / Maximum Inner Product Search

# search using squared Euclidean distances
# (still using the Digits dataset from above)
enc = bolt.Encoder('l2', accuracy='high').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q]  # knn for each query

# search using dot product (maximum inner product search)
enc = bolt.Encoder('dot', accuracy='medium').fit(X)
bolt_knn = [enc.knn(q, k_bolt) for q in Q]  # knn for each query

Miscellaneous

Bolt stands for "Based On Lookup Tables". Feel free to use this exciting fact at parties.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 70

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (6) 🔗