Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

📕 Various example reports I use for SQL Server Reporting Services (SSRS) as well as documents for unit testing, requirements and a style guide template.

Stars: ✭ 63 (-1.56%)

Mutual labels: database

Hermitdb

A private decentralized database replicated over Git (or any other distributed log)

Stars: ✭ 61 (-4.69%)

Mutual labels: database

Data Science Best Resources

Carefully curated resource links for data science in one place

Stars: ✭ 1,104 (+1625%)

Mutual labels: database

Blazor.indexeddb.framework

A framework for blazor which acts as an interface to IndexedDB

Stars: ✭ 62 (-3.12%)

Mutual labels: database

Dolt

Dolt – It's Git for Data

Stars: ✭ 9,880 (+15337.5%)

Mutual labels: database

Semantic forms

Form generators leveraging semantic web standards (RDF(S), OWL, SPARQL , ...

Stars: ✭ 63 (-1.56%)

Mutual labels: database

Mssql Docker

Official Microsoft repository for SQL Server in Docker resources

Stars: ✭ 1,111 (+1635.94%)

Mutual labels: database

Eventql

Distributed "massively parallel" SQL query engine

Stars: ✭ 1,121 (+1651.56%)

Mutual labels: database

Skunk

A data access library for Scala + Postgres.

Stars: ✭ 1,107 (+1629.69%)

Mutual labels: database

Pyhawk

Searches the directory of choice for interesting files. Such as database files and files with passwords stored on them

Stars: ✭ 60 (-6.25%)

Mutual labels: database

Laravel Cadillac

🍺 A database tool for laravel.

Stars: ✭ 63 (-1.56%)

Mutual labels: database

Restfeel

RESTFeel: 一个企业级的API管理&测试平台。RESTFeel帮助你设计、开发、测试您的API。

Stars: ✭ 59 (-7.81%)

Mutual labels: database

Aliyun Tablestore Go Sdk

TableStore SDK for Golang

Stars: ✭ 63 (-1.56%)

Mutual labels: database

Sql.js

A javascript library to run SQLite on the web.

Stars: ✭ 9,594 (+14890.63%)

Mutual labels: database

Layr

Dramatically simplify full‑stack development

Stars: ✭ 1,111 (+1635.94%)

Mutual labels: database

Awesome Cn

awesome项目中文翻译，提升查阅效率

Stars: ✭ 62 (-3.12%)

Mutual labels: index

Littletable

An in-memory database of Python objects, searchable using quasi-SQL API

Stars: ✭ 64 (+0%)

Mutual labels: database

View All Similar Projects ➔

Learned Indexes

Implementation of BTree part for paper 'The Case for Learned Index Structures'.

T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. https://arxiv.org/abs/1712.01208, 2017

Language: Python
Support content: Integer values, Random and Exponential distribution

Files Structures

data/: test data
model/: learned NN model
perfromance/：NN and BTree performance
Learned_BTree.py: main file
Trained_NN.py: NN structures

HOW TO RUN

First, you need to install python2.7.x and package tensorflow, pandas, numpy, enum.
Second, use command to run the Learned_BTree.py fule, that is,
python Learned_BTree.py -t <Type> -d <Distribution> [-p|-n] [Percent]|[Number] [-c] [New data] [-h].

Parameters:
'type': 'Type: sample, full',
'distribution': 'Distribution: random, exponential',
'percent': 'Percent: 0.1-1.0, default value = 0.5; sample train data size = 300,000',
'number': 'Number: 10,000-10,000,000, default value = 300,000',
'new data' 'New Data: INTEGER, 0 for no creating new data file, others for creating'

Example:
python Learned_BTree.py -t full -d random -n 100000 -c 1

中文

本项目代码是对《The Case For Learned Index Structures》一文的简单实现，实现了文章中BTree的部分，目前支持整数测试集，可以选用随机分布或者指数分布进行测试。

T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. https://arxiv.org/abs/1712.01208, 2017

此外，项目还对有新数据插入的场景进行了探索。

数据索引

主要步骤

依据论文中思想，搭建混合多级神经网络架构

 Input: int threshold, int stages[]
 Data: record data[]
 Result: trained index
1 M = stages.size;
2 tmp_records[][];
3 tmp_records[1][1] = all_data;
4 for i ← 1 to M do
5   for j ← 1 to stages[i] do
6     index[i][j] = new NN trained on tmp.records[i][j];
7     if i < M then
8       for r ∈ tmp.records[i][j] do
9         𝑞 = f(r.key) / stages[i + 1];
10        tmp.records[i + 1][ 𝑞].add(r);
11 for j ← 1 to index[M].size do
12   index[M][j].calc_err(tmp.records[M][j]);
13   if index[M][j].max_abs_err > threshold then
14     index[M][j] = new B-Tree trained on tmp_records[M][j];
15 return index;

在以上程序中，从整个数据集开始（第3行）开始，首先训练第1级模型。基于第1级模型的预测，从下一级挑选模型，并添加相应的关键字到该模型训练集（第9行和第10行），然后训练这个模型。最后，检查最后一级的模型，如果平均误差高于预定义的阈值，则用B树替换神经网络模型（第11-14行）。 所使用的模型均为全连接网络，随机分布用的是没有隐藏的全连接网络；指数分布用的是有1个有8个核的隐藏层的全连接网络

使用数据测试神经网络索引和B树索引，对比两者性能

抽样学习

神经网络模型的训练需要较长的时间，通过抽取一部分数据训练的方式，加快训练的速度。

存储优化

基于后续插入数据的分布与现有分布相近的观点。

主要步骤

根据建立的数据索引估计数据分布，并移动数据的位置，预留出空间。比如原先0-100的数据占据100个BLOCK，预计最终存储数据是现在的2倍，则预留100个BLOCK。
插入数据，与不进行优化比较。

优势

新插入数据冲突少，加快插入速度。
无需重新调整索引，降低索引维护代价，支持了新数据插入场景。

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 64

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

yangjufo / Learned Indexes

Programming Languages

Labels

Projects that are alternatives of or similar to Learned Indexes

Learned Indexes

Files Structures

HOW TO RUN

Other Content

Sample Training

Storage Optimization

中文

数据索引

主要步骤

抽样学习

存储优化

主要步骤

优势