BenJoyenConseil / rmi

Licence: Apache-2.0 License
A learned index structure

Programming Languages

Jupyter Notebook
11667 projects
go
31211 projects - #10 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to rmi

cobra
A Python package to build predictive linear and logistic regression models focused on performance and interpretation
Stars: ✭ 23 (-54.9%)
Mutual labels:  linear-regression
tile38
Real-time Geospatial and Geofencing
Stars: ✭ 8,117 (+15815.69%)
Mutual labels:  index
bkdtree
Persistent Block KD Tree In Golang for Search Filtering
Stars: ✭ 32 (-37.25%)
Mutual labels:  index
markdown-index
Generate a global index for multiple markdown files recursively
Stars: ✭ 15 (-70.59%)
Mutual labels:  index
flow-indexer
Flow-Indexer indexes flows found in chunked log files from bro,nfdump,syslog, or pcap files
Stars: ✭ 43 (-15.69%)
Mutual labels:  index
jquery-alphaindex
jQuery plugin to create alphabetical indexes for your lists
Stars: ✭ 12 (-76.47%)
Mutual labels:  index
machine learning in python
Demo of basic machine learning models in python with Jupter Notebook
Stars: ✭ 16 (-68.63%)
Mutual labels:  linear-regression
cobs
COBS - Compact Bit-Sliced Signature Index (for Genomic k-Mer Data or q-Grams)
Stars: ✭ 64 (+25.49%)
Mutual labels:  index
stats
📈 Useful notes and personal collections on statistics.
Stars: ✭ 16 (-68.63%)
Mutual labels:  linear-regression
Market-Mix-Modeling
Market Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales
Stars: ✭ 31 (-39.22%)
Mutual labels:  linear-regression
abess
Fast Best-Subset Selection Library
Stars: ✭ 266 (+421.57%)
Mutual labels:  linear-regression
models-by-example
By-hand code for models and algorithms. An update to the 'Miscellaneous-R-Code' repo.
Stars: ✭ 43 (-15.69%)
Mutual labels:  linear-regression
feels
🌀 Calculate apparent temperature using heat index, approximate wet-bulb globe temperature, humidex, australian apparent temperature and wind chill.
Stars: ✭ 25 (-50.98%)
Mutual labels:  index
TotalLeastSquares.jl
Solve many kinds of least-squares and matrix-recovery problems
Stars: ✭ 23 (-54.9%)
Mutual labels:  linear-regression
libDrive
libDrive is a Google Drive media library manager and indexer, similar to Plex, that organizes Google Drive media to offer an intuitive and user-friendly experience.
Stars: ✭ 14 (-72.55%)
Mutual labels:  index
index-autoload
Adds an index to the autoload in wp_options table and verifies it exists on a daily basis (using WP Cron), resulting in a more efficient database.
Stars: ✭ 18 (-64.71%)
Mutual labels:  index
MachineLearning
โค้ดประกอบเนื้อหา Python Machine Learning เบื้องต้น [2020]
Stars: ✭ 28 (-45.1%)
Mutual labels:  linear-regression
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-66.67%)
Mutual labels:  linear-regression
VBLinLogit
Variational Bayes linear and logistic regression
Stars: ✭ 25 (-50.98%)
Mutual labels:  linear-regression
Machine-Learning-Andrew-Ng
机器学习-Coursera-吴恩达- python+Matlab代码实现
Stars: ✭ 127 (+149.02%)
Mutual labels:  linear-regression

RMI

Go PkgGoDev codecov

A goland implementation of a RMI (Recursive Model Indexes), a Learned Index structure based on the research work by Kraska & al.

Fig 1 from the Case for Learned Index Structures

usage

Create an index and make lookups

// load the age column and parse values into float64 values
ageColumn := extractAgeColumn("data/people.csv")

// create an index over the age column
index := index.New(ageColumn)

// search an age and get back its line position inside the file people.csv
search, _ := strconv.ParseFloat(os.Args[1], 64)
lines, _ := index.Lookup(search)

the main.go file contains an example of a learned index overdata/people.csv age column.

It outputs :

$ cat data/people.csv
name,age,sex
jeanne,90,F
jean,23,M
Carlos,3,M
Carlotta,45,F
Miguel,1,M
Martine,1.5,F
Georgette,23,F

$ go run main.go 23
2020/11/15 20:29:56 People who are 23 years old are located at [8 3] inside data/people.csv 

This is the plot showing the approximation (the linear regression), the cumulative distribution function for each value, and the current age's value (the Keys of the index) :

Fig 2 the LearnedIndex over people.csv

features

  • A simple linear regression model learning the CDF of a float64 array
  • A learned index structure fitted on keys of a collection
  • Finding rows id on a CSV file
  • Return a list of lines matching the key
  • Use max + min error bounding elements to search quickly
  • Benchmarks InMemory LearnedIndex against InMem BinarySearch
  • Store offset lines and a primary key index
  • Store the sortedTable
  • CLI to create indexes over CSV
  • Benchmarks Learned against BinarySearchTree
  • A two layer recursive index
  • Learn on integer
  • Index is persistent and durable (on hard drive)
  • A sort algorythm using learned structure
  • Learning on string type ?

related works

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].