All Projects → decile-team → submodlib

decile-team / submodlib

Licence: other
Summarize Massive Datasets using Submodular Optimization

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to submodlib

Data-Structures-and-Algorithms
Implementation of various Data Structures and algorithms - Linked List, Stacks, Queues, Binary Search Tree, AVL tree,Red Black Trees, Trie, Graph Algorithms, Sorting Algorithms, Greedy Algorithms, Dynamic Programming, Segment Trees etc.
Stars: ✭ 144 (+300%)
Mutual labels:  greedy-algorithms
axon
Nx-powered Neural Networks
Stars: ✭ 1,170 (+3150%)
Mutual labels:  optimizers
sinkhorn-policy-gradient.pytorch
Code accompanying the paper "Learning Permutations with Sinkhorn Policy Gradient"
Stars: ✭ 36 (+0%)
Mutual labels:  combinatorial-optimization
Algodeck
An Open-Source Collection of 200+ Algorithmic Flash Cards to Help you Preparing your Algorithm & Data Structure Interview 💯
Stars: ✭ 4,441 (+12236.11%)
Mutual labels:  greedy-algorithms
conjure
Conjure: The Automated Constraint Modelling Tool
Stars: ✭ 84 (+133.33%)
Mutual labels:  combinatorial-optimization
Hypergradient variants
Improved Hypergradient optimizers, providing better generalization and faster convergence.
Stars: ✭ 15 (-58.33%)
Mutual labels:  optimizers
The-Complete-Rust-Programming-Reference-Guide
Design, develop, and deploy effective software systems using the advanced constructs of Rust
Stars: ✭ 53 (+47.22%)
Mutual labels:  greedy-algorithms
Learning-Made-Easy
This project can help you understand the Data Structure and Algorithms in a more efficient manner. It aims at scheduling the studies for maximizing marks during exams. Most students face this problem during exams that what to study to get the best out of their limited time.
Stars: ✭ 133 (+269.44%)
Mutual labels:  greedy-algorithms
LeetCode
Solution to LeetCode Problems in Python and Golang 🎯
Stars: ✭ 12 (-66.67%)
Mutual labels:  greedy-algorithms
pmartR
The pmartR R package provides functionality for quality control, normalization, exploratory data analysis, and statistical analysis of mass spectrometry (MS) omics data, in particular proteomic (either at the peptide or the protein level), lipidomic, and metabolomic data. This includes data transformation, specification of groups that are to be …
Stars: ✭ 19 (-47.22%)
Mutual labels:  data-summarization
algoexpert
AlgoExpert is an online platform that helps software engineers to prepare for coding and technical interviews.
Stars: ✭ 8 (-77.78%)
Mutual labels:  greedy-algorithms
DSA--GeeksForGeeks
DSA course solutions in C++ Jump to below directly for more problems
Stars: ✭ 47 (+30.56%)
Mutual labels:  greedy-algorithms
ecole
Extensible Combinatorial Optimization Learning Environments
Stars: ✭ 249 (+591.67%)
Mutual labels:  combinatorial-optimization
geeks-for-geeks-solutions
Solutions of questions on Geeks-for-Geeks.Solution Available in C++.
Stars: ✭ 28 (-22.22%)
Mutual labels:  greedy-algorithms
ThinkMatch
Code & pretrained models of novel deep graph matching methods.
Stars: ✭ 639 (+1675%)
Mutual labels:  combinatorial-optimization
3013-Algorithms
Algorithms Course Repo
Stars: ✭ 15 (-58.33%)
Mutual labels:  greedy-algorithms
mxfactorial
a payment application intended for deployment by the united states treasury
Stars: ✭ 36 (+0%)
Mutual labels:  combinatorial-optimization
GHOST
General meta-Heuristic Optimization Solving Toolkit
Stars: ✭ 28 (-22.22%)
Mutual labels:  combinatorial-optimization
Competitive Programming
Contains solutions and codes to various online competitive programming challenges and some good problems. The links to the problem sets are specified at the beginning of each code.
Stars: ✭ 65 (+80.56%)
Mutual labels:  greedy-algorithms
psolving-paradigms
Common problems of dynamic programming methods and techniques, including prerequisites, for competitive programmers.
Stars: ✭ 34 (-5.56%)
Mutual labels:  greedy-algorithms


            

About SubModLib

SubModLib is an easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine. Submodlib finds its application in summarization, data subset selection, hyper parameter tuning, efficient training etc. Through a rich API, it offers a great deal of flexibility in the way it can be used.

Please check out our latest arxiv preprint: https://arxiv.org/abs/2202.10680

Salient Features

  • Rich suite of functions for a wide variety of subset selection tasks:
    • regular set (submodular) functions
    • submodular mutual information functions
    • conditional gain functions
    • conditional mutual information functions
  • Supports different types of optimizers
    • naive greedy
    • lazy (accelerated) greedy
    • stochastic (random) greedy
    • lazier than lazy greedy
  • Combines the best of Python's ease of use and C++'s efficiency
  • Rich API which gives a variety of options to the user. See this notebook for an example of different usage patterns
  • De-coupled function and optimizer paradigm makes it suitable for a wide-variety of tasks
  • Comprehensive documentation (available here)

Google Colab Notebooks Demonstrating the power of SubModLib and sample usage

Setup

Alternative 1

  • $ pip install -i https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ submodlib

Alternative 2 (if local docs need to be built and test cases need to be run)

  • $ git clone https://github.com/decile-team/submodlib.git
  • $ cd submodlib
  • $ pip install .
  • Latest documentation is available at readthedocs. However, if local documentation is required to be built, follow these steps::
    • $ pip install -U sphinx
    • $ pip install sphinxcontrib-bibtex
    • $ pip install sphinx-rtd-theme
    • $ cd docs
    • $ make clean html
  • To run the tests, follow these steps:
    • $ pip install pytest
    • $ pytest # this runs ALL tests
    • $ pytest -m <marker> --verbose --disable-warnings -rA # this runs test specified by the . Possible markers are mentioned in pyproject.toml file.

Usage

It is very easy to get started with submodlib. Using a submodular function in submodlib essentially boils down to just two steps:

  1. instantiate the corresponding function object
  2. invoke the desired method on the created object

The most frequently used methods are:

  1. f.evaluate() - takes a subset and returns the score of the subset as computed by the function f
  2. f.marginalGain() - takes a subset and an element and returns the marginal gain of adding the element to the subset, as computed by f
  3. f.maximize() - takes a budget and an optimizer to return an optimal set as a result of maximizing f

For example,

from submodlib import FacilityLocationFunction
objFL = FacilityLocationFunction(n=43, data=groundData, mode="dense", metric="euclidean")
greedyList = objFL.maximize(budget=10,optimizer='NaiveGreedy')

For a more detailed discussion on all possible usage patterns, please see Different Options of Usage

Functions

Modelling Capabilities of Different Functions

We demonstrate the representational power and modeling capabilities of different functions qualitatively in the following Google Colab notebooks:

This notebook contains a quantitative analysis of performance of different functions and role of the parameterization in aspects like query-coverage, query-relevance, privacy-irrelevance and diversity for different SMI, CG and CMI functions as observed on synthetically generated dataset. This notebook contains similar analysis on ImageNette dataset.

Optimizers

Sample Application (Image collection summarization)

  • This notebook contains demonstration of using submodlib for an image collection summarization application.

Timing Analysis

To gauge the performance of submodlib, selection by Facility Location was performed on a randomly generated dataset of 1024-dimensional points. Specifically the following code was run for the number of data points ranging from 50 to 10000.

K_dense = helper.create_kernel(dataArray, mode="dense", metric='euclidean', method="other")
obj = FacilityLocationFunction(n=num_samples, mode="dense", sijs=K_dense, separate_rep=False,pybind_mode="array")
obj.maximize(budget=budget,optimizer=optimizer, stopIfZeroGain=False, stopIfNegativeGain=False, verbose=False, show_progress=False)

The above code was timed using Python's timeit module averaged across three executions each. We report the following numbers:

Number of data points Time taken (in seconds)
50 0.00043
100 0.001074
200 0.003024
500 0.016555
1000 0.081773
5000 2.469303
6000 3.563144
7000 4.667065
8000 6.174047
9000 8.010674
10000 9.417298

Contributors

  • Vishal Kaushal, Ganesh Ramakrishnan and Rishabh Iyer. Currently maintained by CARAML Lab

Contact

Should you face any issues or have any feedback or suggestions, please feel free to contact vishal[dot]kaushal[at]gmail.com

Acknowledgements

This work is supported by the Ekal Fellowship (www.ekal.org). This work is also supported by the National Science Foundation(NSF) under Grant Number 2106937, a startup grant from UT Dallas, as well as Google and Adobe awards.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].