Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

gvinciguerra / Pgm Index

Licence: apache-2.0

🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

Programming Languages

cpp

1120 projects

Labels

machine-learning database data-structures big-data compression research indexing

Projects that are alternatives of or similar to Pgm Index

Pygm

🐍 Python library implementing sorted containers with state-of-the-art query performance and compressed memory usage

Stars: ✭ 156 (-68.74%)

Mutual labels: data-structures, database, research, compression

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+818.04%)

Mutual labels: big-data, database

Succinct

Enabling queries on compressed data.

Stars: ✭ 257 (-48.5%)

Mutual labels: big-data, compression

Interview

📚 C/C++ 技术面试基础知识总结，包括语言、程序库、数据结构、算法、系统、网络、链接装载库等知识及面试经验、招聘、内推等信息。This repository is a summary of the basic knowledge of recruiting job seekers and beginners in the direction of C/C++ technology, including language, program library, data structure, algorithm, system, network, link loading library, interview experience, recruitment, recommendatio…

Stars: ✭ 21,608 (+4230.26%)

Mutual labels: data-structures, database

Technicalnote

Repository to store what we have studied. 📖 We want everyone to get a job through TechnicalNote.

Stars: ✭ 206 (-58.72%)

Mutual labels: data-structures, database

prunnable-layers-pytorch

Prunable nn layers for pytorch.

Stars: ✭ 47 (-90.58%)

Mutual labels: compression, research

Couchdb Fauxton

Apache CouchDB

Stars: ✭ 295 (-40.88%)

Mutual labels: big-data, database

Sdb

Simple and fast string based key-value database with support for arrays and json

Stars: ✭ 159 (-68.14%)

Mutual labels: data-structures, database

Hive

Apache Hive

Stars: ✭ 4,031 (+707.82%)

Mutual labels: big-data, database

Ignite

Apache Ignite

Stars: ✭ 4,027 (+707.01%)

Mutual labels: big-data, database

Orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Stars: ✭ 389 (-22.04%)

Mutual labels: big-data, database

Keyvi

Keyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.

Stars: ✭ 171 (-65.73%)

Mutual labels: data-structures, big-data

Cs offer

计算机学科基础知识和主流编程语言相关内容的总结

Stars: ✭ 2,016 (+304.01%)

Mutual labels: data-structures, database

bftkv

A distributed key-value storage that's tolerant to Byzantine fault.

Stars: ✭ 27 (-94.59%)

Mutual labels: research, big-data

Vue Materialize Datatable

A fancy Materialize CSS datatable VueJS component.

Stars: ✭ 162 (-67.54%)

Mutual labels: data-structures, database

Crate

CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.

Stars: ✭ 3,254 (+552.1%)

Mutual labels: big-data, database

Recipe

RECIPE : high-performance, concurrent indexes for persistent memory (SOSP 2019)

Stars: ✭ 145 (-70.94%)

Mutual labels: data-structures, indexing

Courses

Quiz & Assignment of Coursera

Stars: ✭ 454 (-9.02%)

Mutual labels: data-structures, big-data

Js Worker Search

JavaScript client-side search API with web-worker support

Stars: ✭ 345 (-30.86%)

Mutual labels: indexing, database

Listenbrainz Server

Server for the ListenBrainz project

Stars: ✭ 420 (-15.83%)

Mutual labels: big-data, database

View All Similar Projects ➔

The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes while providing the same worst-case query time guarantees.

Quickstart

This is a header-only library. It does not need to be installed. Just clone the repo with

git clone https://github.com/gvinciguerra/PGM-index.git
cd PGM-index

and copy the include/pgm directory to your system's or project's include path.

The examples/simple.cpp file shows how to index and query a vector of random integers with the PGM-index:

#include <vector>
#include <cstdlib>
#include <iostream>
#include <algorithm>
#include "pgm/pgm_index.hpp"

int main() {
    // Generate some random data
    std::vector<int> data(1000000);
    std::generate(data.begin(), data.end(), std::rand);
    data.push_back(42);
    std::sort(data.begin(), data.end());

    // Construct the PGM-index
    const int epsilon = 128; // space-time trade-off parameter
    pgm::PGMIndex<int, epsilon> index(data);

    // Query the PGM-index
    auto q = 42;
    auto range = index.search(q);
    auto lo = data.begin() + range.lo;
    auto hi = data.begin() + range.hi;
    std::cout << *std::lower_bound(lo, hi, q);

    return 0;
}

Run and edit this and other examples on Repl.it. Or run it locally via:

g++ examples/simple.cpp -std=c++17 -I./include -o simple
./simple

Classes overview

Other than the pgm::PGMIndex class in the example above, this library provides the following classes:

pgm::DynamicPGMIndex supports insertions and deletions.
pgm::MultidimensionalPGMIndex stores points in k dimensions and supports orthogonal range queries.
pgm::MappedPGMIndex stores data on disk and uses a PGMIndex for fast search operations.
pgm::CompressedPGMIndex compresses the segments to reduce the space usage of the index.
pgm::OneLevelPGMIndex uses a binary search on the segments rather than a recursive structure.
pgm::BucketingPGMIndex uses a top-level lookup table to speed up the search on the segments.
pgm::EliasFanoPGMIndex uses a top-level succinct structure to speed up the search on the segments.

The full documentation is available here.

Compile the tests and the tuner

After cloning the repository, build the project with

cmake . -DCMAKE_BUILD_TYPE=Release
make -j8

The test runner will be placed in test/. The tuner executable will be placed in tuner/.

License

This project is licensed under the terms of the Apache License 2.0.

If you use the library please put a link to the website and cite the following paper:

Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB, 13(8): 1162-1175, 2020.

@article{Ferragina:2020pgm,
  Author = {Paolo Ferragina and Giorgio Vinciguerra},
  Title = {The {PGM-index}: a fully-dynamic compressed learned index with provable worst-case bounds},
  Year = {2020},
  Volume = {13},
  Number = {8},
  Pages = {1162--1175},
  Doi = {10.14778/3389133.3389135},
  Url = {https://pgm.di.unipi.it},
  Issn = {2150-8097},
  Journal = {{PVLDB}}}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 499

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗