All Projects β†’ gvinciguerra β†’ Pygm

gvinciguerra / Pygm

Licence: apache-2.0
🐍 Python library implementing sorted containers with state-of-the-art query performance and compressed memory usage

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pygm

Pgm Index
πŸ…State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (+219.87%)
Mutual labels:  data-structures, database, research, compression
Interviews
A list of fancy questions I've been asked during the interviews I had. Some of them I ask when interviewing people.
Stars: ✭ 140 (-10.26%)
Mutual labels:  algorithms, data-structures, database
Jupyter
Stars: ✭ 145 (-7.05%)
Mutual labels:  algorithms, data-structures, data-science
19 udacity dsa
Data Structures & Algorithms Nanodegree Program from Udacity
Stars: ✭ 140 (-10.26%)
Mutual labels:  algorithms, data-structures
You Dont Know X
πŸ™ˆ curated list of inspiring resources which show you don't know that much about something you thought you knew.
Stars: ✭ 139 (-10.9%)
Mutual labels:  algorithms, data-structures
Data Structures
Common data structures and algorithms implemented in JavaScript
Stars: ✭ 139 (-10.9%)
Mutual labels:  algorithms, data-structures
Algorithms
A collection of common algorithms and data structures implemented in java, c++, and python.
Stars: ✭ 142 (-8.97%)
Mutual labels:  algorithms, data-structures
Coding Problems
Solutions for various coding/algorithmic problems and many useful resources for learning algorithms and data structures
Stars: ✭ 2,221 (+1323.72%)
Mutual labels:  algorithms, data-structures
Ultimate Java Resources
Java programming. All in one Java Resource for learning. Updated every day and up to date. All Algorithms and DS along with Development in Java. Beginner to Advanced. Join the Discord link.
Stars: ✭ 143 (-8.33%)
Mutual labels:  algorithms, data-structures
Data Structures And Algorithms
Data Structures and Algorithms implemented In Python, C, C++, Java or any other languages. Aimed to help strengthen the concepts of DS&A. Give a Star 🌟 if it helps you.
Stars: ✭ 146 (-6.41%)
Mutual labels:  algorithms, data-structures
Hackerrank
A collection of algorithms and solutions to problems in various languages from the site Hacker Rank.
Stars: ✭ 145 (-7.05%)
Mutual labels:  algorithms, data-structures
Labs
Labs for the Foundations of Applied Mathematics curriculum.
Stars: ✭ 150 (-3.85%)
Mutual labels:  algorithms, data-science
Scilab
Free and Open Source software for numerical computation providing a powerful computing environment for engineering and scientific applications.
Stars: ✭ 138 (-11.54%)
Mutual labels:  data-structures, data-science
Leetcode Patterns
A curated list of leetcode questions grouped by their common patterns
Stars: ✭ 3,750 (+2303.85%)
Mutual labels:  algorithms, data-structures
Matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Stars: ✭ 141 (-9.62%)
Mutual labels:  algorithms, data-science
Placement Preparation
Hello everyone, I have created this repository specifically for competitive questions and for placements preparation.
Stars: ✭ 137 (-12.18%)
Mutual labels:  algorithms, data-structures
Dsa Geeksclasses
DSA-Self Paced With Doubt Assistance Course Solutions in Python (Python 3)
Stars: ✭ 137 (-12.18%)
Mutual labels:  algorithms, data-structures
Datastructures Algorithms
The best library for implementation of all Data Structures and Algorithms - Trees + Graph Algorithms too!
Stars: ✭ 2,105 (+1249.36%)
Mutual labels:  algorithms, data-structures
Important Java Concepts
πŸš€ Complete Java - A to Z β•‘ πŸ“š Notes and Programs of all Important Concepts of Java - OOPS, Data Structures, Algorithms, Design Patterns & Development + Kotlin + Android πŸ”₯
Stars: ✭ 135 (-13.46%)
Mutual labels:  algorithms, data-structures
The Python Workshop
A New, Interactive Approach to Learning Python
Stars: ✭ 150 (-3.85%)
Mutual labels:  algorithms, data-science

pygm

PyGM is a Python library that enables fast query operations on sorted lists of numbers (like integers and floats) with a tiny memory overhead. Internally, it features the PGM-index, a state-of-the-art learned data structure that robustly scales to billions of elements in just a few tens of megabytes.

Build status Code coverage PyPI License GitHub stars GitHub forks

Quick start

Install with pip:

pip install pygm

PyGM supports both standard and other useful list and set operations:

>>> from pygm import SortedList, SortedSet
>>> sl = SortedList([0, 1, 34, 144, 1, 55, 233, 2, 3, 21, 89, 5, 8, 13])
>>> sl
SortedList([0, 1, 1, ..., 144, 233])
>>> sl.find_gt(9)                                   # smallest element > 9
13
>>> sl.count(1)                                     # frequency of 1
2
>>> 42 in sl                                        # membership test
False
>>> list(sl.range(0, 21, inclusive=(False, True)))  # elements 0 < x <= 21
[1, 1, 2, 3, 5, 8, 13, 21]
>>> sl[2:10:3]                                      # slicing syntax support
SortedList([1, 5, 21])
>>> (sl + [-3, -2, -1]).rank(0)                     # number of elements <= 0
4
>>> ss = SortedSet([1, 2, 3, 4]) ^ {3, 4, 5}        # set symmetric difference
>>> ss.find_lt(5)
2

The full documentation is available online and in the Python interpreter via the help() built-in function.

Installation

PyGM is compatible with Python 3.3+. The easiest way to install it is through PyPI:

pip install pygm

Otherwise, you can clone the repo, build it from source and install it as follows:

if [[ "$(uname)" == "Darwin" ]]; then brew install libomp; fi
git clone https://github.com/gvinciguerra/PyGM.git
cd PyGM
git submodule update --init --recursive
pip install .

Remember to leave the source directory PyGM/ and its parent before running Python.

Performance

Here are some plots that compare the performance of PyGM with two popular libraries, sortedcontainers and blist, on synthetic data.

Query performance of Python packages implementing sorted lists

All the graphs are log-log and show, for increasing data sizes, the average time it takes to perform each operation (lower is better). In particular, the __init__ plot shows the construction time, __contains__ measures membership queries, and __getitem__ shows the cost of accessing an element given its position.

The interesting operations on sorted lists are: (i) index, which returns the position of the first occurrence of a given element; (ii) count, which returns the number of occurrences of a given element; (iii) bisect_left, which returns the insertion point for a given value in the list to maintain the sorted order (and is used to implement find_[ge|gt|le|lt]).

You can run and plot the experiments on your computer and your data with the notebook in tests/benchmark.ipynb.

License

This project is licensed under the terms of the Apache License 2.0.

If you use the library in an academic setting, please cite the following paper:

Paolo Ferragina and Giorgio Vinciguerra. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB, 13(8): 1162-1175, 2020.

@article{Ferragina:2020pgm,
  Author = {Paolo Ferragina and Giorgio Vinciguerra},
  Title = {The {PGM-index}: a fully-dynamic compressed learned index with provable worst-case bounds},
  Year = {2020},
  Volume = {13},
  Number = {8},
  Pages = {1162--1175},
  Doi = {10.14778/3389133.3389135},
  Url = {https://pgm.di.unipi.it},
  Issn = {2150-8097},
  Journal = {{PVLDB}}}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].