All Projects → Dyalog → vecdb

Dyalog / vecdb

Licence: MIT license
A simple "columnar database" based on memory-mapped files, written in APL

Programming Languages

APL
13 projects

Projects that are alternatives of or similar to vecdb

ride
Remote IDE for Dyalog APL
Stars: ✭ 144 (+453.85%)
Mutual labels:  dyalog, dyalog-apl
MiServer
MiServer- an APL-based web server - requires Dyalog APL available from http://www.dyalog.com
Stars: ✭ 43 (+65.38%)
Mutual labels:  dyalog, dyalog-apl
Jarvis
APL-based web service framework supporting JSON or REST
Stars: ✭ 17 (-34.62%)
Mutual labels:  dyalog, dyalog-apl
vektonn
vektonn.github.io/vektonn
Stars: ✭ 109 (+319.23%)
Mutual labels:  vector
XLocalizer
Localizer package for Asp.Net Core web applications, powered by online translation and auto resource creating.
Stars: ✭ 103 (+296.15%)
Mutual labels:  db
nmu
neg4n's mathematics utilities
Stars: ✭ 17 (-34.62%)
Mutual labels:  vector
Tensor
A library and extension that provides objects for scientific computing in PHP.
Stars: ✭ 146 (+461.54%)
Mutual labels:  vector
helm-charts
Helm charts for Vector.
Stars: ✭ 50 (+92.31%)
Mutual labels:  vector
gmath
Lightweight C++ geometry math library (Vectors, Quaternions and Matrices)
Stars: ✭ 33 (+26.92%)
Mutual labels:  vector
GenericTensor
The only library allowing to create Tensors (matrices extension) with custom types
Stars: ✭ 42 (+61.54%)
Mutual labels:  vector
LinAlg
实现一个线性代数库,为Python写扩展。《程序猿的数学3 线性代数》读后笔记
Stars: ✭ 17 (-34.62%)
Mutual labels:  vector
executor-hnsw-postgres
A production-ready, scalable Indexer for the Jina neural search framework, based on HNSW and PSQL
Stars: ✭ 25 (-3.85%)
Mutual labels:  vector
compat-db
A browser API compatibility database
Stars: ✭ 61 (+134.62%)
Mutual labels:  db
ngrest-db
Simple ORM to use with ngrest
Stars: ✭ 27 (+3.85%)
Mutual labels:  db
issuer-icons
Vector graphics of one-time password issuer logo's, used in Raivo OTP for iOS.
Stars: ✭ 79 (+203.85%)
Mutual labels:  vector
dynamo-node
DynamoDB mapper
Stars: ✭ 12 (-53.85%)
Mutual labels:  db
unitdb
Fast specialized time-series database for IoT, real-time internet connected devices and AI analytics.
Stars: ✭ 97 (+273.08%)
Mutual labels:  db
sparse
Sparse matrix formats for linear algebra supporting scientific and machine learning applications
Stars: ✭ 136 (+423.08%)
Mutual labels:  vector
awesome-vector-search
Collections of vector search related libraries, service and research papers
Stars: ✭ 460 (+1669.23%)
Mutual labels:  vector
Mathematics for Machine Learning
Learn mathematics behind machine learning and explore different mathematics in machine learning.
Stars: ✭ 28 (+7.69%)
Mutual labels:  vector

README

vecdb Current version: 0.2.3

What is this repository for?

vecdb is a simple "columnar database": each column in the database is stored in a single memory-mapped files. It is written in and for Dyalog APL as a tool on which to base new applications which need to generate and query very large amounts of data and do a large number of high performance reads, but do not need a full set of RDBMS features. In particuler, there is no "transactional" storage mechanism, and no ability to join tables built-in to the database.

Features

Supported data types:

  • 1, 2 and 4 byte integers
  • 8-byte IEEE double-precision floats
  • Boolean
  • Char (via a "symbol table" of up to 32,767 unique strings indexed by 2-byte integers)

Sharding

vecdb databases can be sharded, or horizontally partitioned. Each shard is a separate folder, named when the database is created (by default, there is a single shard). Each folder contains a file for each database column - which is memory mapped to an APL vector when the database is opened. A list of sharding columns is defined when the db is created; the values of these columns are passed as the argument to a user-defined sharding function, which has to return an origin-1 index into the list of shards, for each record.

Supported Operations

Query: At the moment, the Query function takes a constraint in the form of a list of (column_name values) pairs. Each one represents the relation which can be expressed in APL as (column_data∊values). If more than constraint is provided, they are AND-ed together. Query also takes a list of column names to be retrieved for records which match the constraint.

Query results are returned as a vector with one element per database column, each item containing a vector of values for that column.

Search If the Queryfunction is called with an empty list of columns, record identifiers are returned as a 2-column matrix of (shard) (record index) pairs.

Read: The Read function accepts a matrix in the format returned by a search query and a list of column names, and returns a vector per column.

Update: The Update function also takes as input a search query result, a list of columns, and a vector of vectors containing new data values.

Append: Takes a list of column names and a vector of data vectors, one per named column. The columns involved in the Shard selection must always be included.

Delete: Deletion is not currently supported.

Short-Term Goals

  1. Enhance the query function to accept enhanced queries consisting of column names, comparison functions and values - and support AND/OR. If possible, optimise queries to be sensitive to sharding.
  2. Parallel database queries: For a sharded database: Spin a number of isolate processes up and distribute the shards between them, so that each shard is handled by a single process. Enhance the database API functions to use these processes to perform searches, reads and writes in parallel.
  3. Add a front-end server with a RESTful database API. As it stands, vecdb is effectively an embedded database engine which does not support data sharing between processes on the same or on separate machines.

Longer Term (Dreams)

There are ideas to add support for timeseries and versioning. This would include:

  1. Support for deleting records
  2. Performing all updates without overwriting data, and tagging old data with the timestamps defining its lifetime, allowing efficient queries on the database as it appeared at any given time in the past.
  3. Built-in support for the computation of aggregate values as part of the parallel query mechanism, based on timeseries or other key values.

How do I get set up?

Clone/Fork the repo, and

    ]load vecdb.dyalog

Tests

The full system test creates a database containing all supported data types, inserts and updates records, performs queries, and finally deletes the database.

    ]load TestVecdb.dyalog
    #.TestVecdb.RunAll

See doc\Usage.md for more information on usage.

Contribution guidelines

At this early stage, until the project acquires a bit more direction, we ask you to contact one of the key collaborators to discuss your ideas.

Please read doc\Implementation.md before continuing.

Key Collaborators

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].