All Projects → gvinciguerra → Pgm Index

gvinciguerra / Pgm Index

Licence: apache-2.0
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

Programming Languages

cpp
1120 projects

Projects that are alternatives of or similar to Pgm Index

Pygm
🐍 Python library implementing sorted containers with state-of-the-art query performance and compressed memory usage
Stars: ✭ 156 (-68.74%)
Mutual labels:  data-structures, database, research, compression
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+818.04%)
Mutual labels:  big-data, database
Succinct
Enabling queries on compressed data.
Stars: ✭ 257 (-48.5%)
Mutual labels:  big-data, compression
Interview
📚 C/C++ 技术面试基础知识总结,包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, including language, program library, data structure, algorithm, system, network, link loading library, interview experience, recruitment, recommendatio…
Stars: ✭ 21,608 (+4230.26%)
Mutual labels:  data-structures, database
Technicalnote
Repository to store what we have studied. 📖 We want everyone to get a job through TechnicalNote.
Stars: ✭ 206 (-58.72%)
Mutual labels:  data-structures, database
prunnable-layers-pytorch
Prunable nn layers for pytorch.
Stars: ✭ 47 (-90.58%)
Mutual labels:  compression, research
Couchdb Fauxton
Apache CouchDB
Stars: ✭ 295 (-40.88%)
Mutual labels:  big-data, database
Sdb
Simple and fast string based key-value database with support for arrays and json
Stars: ✭ 159 (-68.14%)
Mutual labels:  data-structures, database
Hive
Apache Hive
Stars: ✭ 4,031 (+707.82%)
Mutual labels:  big-data, database
Ignite
Apache Ignite
Stars: ✭ 4,027 (+707.01%)
Mutual labels:  big-data, database
Orc
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-22.04%)
Mutual labels:  big-data, database
Keyvi
Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Stars: ✭ 171 (-65.73%)
Mutual labels:  data-structures, big-data
Cs offer
计算机学科基础知识和主流编程语言相关内容的总结
Stars: ✭ 2,016 (+304.01%)
Mutual labels:  data-structures, database
bftkv
A distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (-94.59%)
Mutual labels:  research, big-data
Vue Materialize Datatable
A fancy Materialize CSS datatable VueJS component.
Stars: ✭ 162 (-67.54%)
Mutual labels:  data-structures, database
Crate
CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.
Stars: ✭ 3,254 (+552.1%)
Mutual labels:  big-data, database
Recipe
RECIPE : high-performance, concurrent indexes for persistent memory (SOSP 2019)
Stars: ✭ 145 (-70.94%)
Mutual labels:  data-structures, indexing
Courses
Quiz & Assignment of Coursera
Stars: ✭ 454 (-9.02%)
Mutual labels:  data-structures, big-data
Js Worker Search
JavaScript client-side search API with web-worker support
Stars: ✭ 345 (-30.86%)
Mutual labels:  indexing, database
Listenbrainz Server
Server for the ListenBrainz project
Stars: ✭ 420 (-15.83%)
Mutual labels:  big-data, database

The PGM-index

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes while providing the same worst-case query time guarantees.

Website | Documentation | Paper | Slides | Python wrapper | A³ Lab

GitHub Workflow Status Language grade: C/C++ License GitHub stars GitHub forks Run on Repl.it

Quickstart

This is a header-only library. It does not need to be installed. Just clone the repo with

git clone https://github.com/gvinciguerra/PGM-index.git
cd PGM-index

and copy the include/pgm directory to your system's or project's include path.

The examples/simple.cpp file shows how to index and query a vector of random integers with the PGM-index:

#include <vector>
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include "pgm/pgm_index.hpp"

int main() {
    // Generate some random data
    std::vector<int> data(1000000);
    std::generate(data.begin(), data.end(), std::rand);
    data.push_back(42);
    std::sort(data.begin(), data.end());

    // Construct the PGM-index
    const int epsilon = 128; // space-time trade-off parameter
    pgm::PGMIndex<int, epsilon> index(data);

    // Query the PGM-index
    auto q = 42;
    auto range = index.search(q);
    auto lo = data.begin() + range.lo;
    auto hi = data.begin() + range.hi;
    std::cout << *std::lower_bound(lo, hi, q);

    return 0;
}

Run and edit this and other examples on Repl.it. Or run it locally via:

g++ examples/simple.cpp -std=c++17 -I./include -o simple
./simple

Classes overview

Other than the pgm::PGMIndex class in the example above, this library provides the following classes:

  • pgm::DynamicPGMIndex supports insertions and deletions.
  • pgm::MultidimensionalPGMIndex stores points in k dimensions and supports orthogonal range queries.
  • pgm::MappedPGMIndex stores data on disk and uses a PGMIndex for fast search operations.
  • pgm::CompressedPGMIndex compresses the segments to reduce the space usage of the index.
  • pgm::OneLevelPGMIndex uses a binary search on the segments rather than a recursive structure.
  • pgm::BucketingPGMIndex uses a top-level lookup table to speed up the search on the segments.
  • pgm::EliasFanoPGMIndex uses a top-level succinct structure to speed up the search on the segments.

The full documentation is available here.

Compile the tests and the tuner

After cloning the repository, build the project with

cmake . -DCMAKE_BUILD_TYPE=Release
make -j8

The test runner will be placed in test/. The tuner executable will be placed in tuner/.

License

This project is licensed under the terms of the Apache License 2.0.

If you use the library please put a link to the website and cite the following paper:

Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB, 13(8): 1162-1175, 2020.

@article{Ferragina:2020pgm,
  Author = {Paolo Ferragina and Giorgio Vinciguerra},
  Title = {The {PGM-index}: a fully-dynamic compressed learned index with provable worst-case bounds},
  Year = {2020},
  Volume = {13},
  Number = {8},
  Pages = {1162--1175},
  Doi = {10.14778/3389133.3389135},
  Url = {https://pgm.di.unipi.it},
  Issn = {2150-8097},
  Journal = {{PVLDB}}}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].