All Projects → Quantco → tabmat

Quantco / tabmat

Licence: BSD-3-Clause license
Efficient matrix representations for working with tabular data

Programming Languages

python
139335 projects - #7 most used programming language
cython
566 projects
C++
36643 projects - #6 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to tabmat

Librec
LibRec: A Leading Java Library for Recommender Systems, see
Stars: ✭ 3,045 (+4250%)
Mutual labels:  matrix, sparse
zalgebra
Linear algebra library for games and real-time graphics.
Stars: ✭ 129 (+84.29%)
Mutual labels:  matrix
Candb
Generate CAN dbc file with OEM defined CAN matrix (*.xls).
Stars: ✭ 36 (-48.57%)
Mutual labels:  matrix
Watch-The-Matrix
A watchOS client for Matrix chat
Stars: ✭ 27 (-61.43%)
Mutual labels:  matrix
N-Matrix-Programmer
A software to write an optimized code that calculates inverse and determinant of N by N matrix.
Stars: ✭ 35 (-50%)
Mutual labels:  matrix
pipeline
Spline is a tool that is capable of running locally as well as part of well known pipelines like Jenkins (Jenkinsfile), Travis CI (.travis.yml) or similar ones.
Stars: ✭ 29 (-58.57%)
Mutual labels:  matrix
LinAlg
实现一个线性代数库,为Python写扩展。《程序猿的数学3 线性代数》读后笔记
Stars: ✭ 17 (-75.71%)
Mutual labels:  matrix
NumPyANN
Implementation of Artificial Neural Networks using NumPy
Stars: ✭ 85 (+21.43%)
Mutual labels:  dense
MatrixChecks
The optimized checks for Matrix Anticheat, a powerful anticheat for Minecraft.
Stars: ✭ 70 (+0%)
Mutual labels:  matrix
Tensor
A library and extension that provides objects for scientific computing in PHP.
Stars: ✭ 146 (+108.57%)
Mutual labels:  matrix
mir-glas
[Experimental] LLVM-accelerated Generic Linear Algebra Subprograms
Stars: ✭ 99 (+41.43%)
Mutual labels:  matrix
sygnal
Sygnal: reference Push Gateway for Matrix
Stars: ✭ 114 (+62.86%)
Mutual labels:  matrix
twitter
A Matrix-Twitter DM puppeting bridge
Stars: ✭ 48 (-31.43%)
Mutual labels:  matrix
Mathematics for Machine Learning
Learn mathematics behind machine learning and explore different mathematics in machine learning.
Stars: ✭ 28 (-60%)
Mutual labels:  matrix
matrix-puppet-slack
puppet style slack bridge for matrix
Stars: ✭ 46 (-34.29%)
Mutual labels:  matrix
GenericTensor
The only library allowing to create Tensors (matrices extension) with custom types
Stars: ✭ 42 (-40%)
Mutual labels:  matrix
ping
A cross-platform and blazingly fast Matrix client focused on group and gaming chat.
Stars: ✭ 55 (-21.43%)
Mutual labels:  matrix
nuke-colortools
A collection of tools for Nuke related to color science and the Academy Color Encoding System (ACES).
Stars: ✭ 66 (-5.71%)
Mutual labels:  matrix
MatrixLib
Lightweight header-only matrix library (C++) for numerical optimization and machine learning. Contact me if there is an exciting opportunity.
Stars: ✭ 35 (-50%)
Mutual labels:  matrix
fml
Fused Matrix Library
Stars: ✭ 24 (-65.71%)
Mutual labels:  matrix

Efficient matrix representations for working with tabular data

CI

Installation

Simply install via conda-forge!

conda install -c conda-forge tabmat

Use case

TL;DR: We provide matrix classes for efficiently building statistical algorithms with data that is partially dense, partially sparse and partially categorical.

Data used in economics, actuarial science, and many other fields is often tabular, containing rows and columns. Further common properties are also common:

  • It often is very sparse.
  • It often contains a mix of dense and sparse columns.
  • It often contains categorical data, processed into many columns of indicator values created by "one-hot encoding."

High-performance statistical applications often require fast computation of certain operations, such as

  • Computing sandwich products of the data, transpose(X) @ diag(d) @ X. A sandwich product shows up in the solution to weighted least squares, as well as in the Hessian of the likelihood in generalized linear models such as Poisson regression.
  • Matrix-vector products, possibly on only a subset of the rows or columns. For example, when limiting computation to an "active set" in a L1-penalized coordinate descent implementation, we may only need to compute a matrix-vector product on a small subset of the columns.
  • Computing all operations on standardized predictors which have mean zero and standard deviation one. This helps with numerical stability and optimizer efficiency in a wide range of machine learning algorithms.

This library and its design

We designed this library with the above use cases in mind. We built this library first for estimating generalized linear models, but expect it will be useful in a variety of econometric and statistical use cases. This library was borne out of our need for speed, and its unified API is motivated by the desire to work with a unified matrix API internal to our statistical algorithms.

Design principles:

  • Speed and memory efficiency are paramount.
  • You don't need to sacrifice functionality by using this library: DenseMatrix and SparseMatrix subclass np.ndarray and scipy.sparse.csc_matrix respectively, and inherit behavior from those classes wherever it is not improved on.
  • As much as possible, syntax follows NumPy syntax, and dimension-reducing operations (like sum) return NumPy arrays, following NumPy dimensions about the dimensions of results. The aim is to make these classes as close as possible to being drop-in replacements for numpy.ndarray. This is not always possible, however, due to the differing APIs of numpy.ndarray and scipy.sparse.
  • Other operations, such as toarray, mimic Scipy sparse syntax.
  • All matrix classes support matrix-vector products, sandwich products, and getcol.

Individual subclasses may support significantly more operations.

Matrix types

  • DenseMatrix represents dense matrices, subclassing numpy nparray. It additionally supports methods getcol, toarray, sandwich, standardize, and unstandardize.
  • SparseMatrix represents column-major sparse data, subclassing scipy.sparse.csc_matrix. It additionally supports methods sandwich and standardize.
  • CategoricalMatrix represents one-hot encoded categorical matrices. Because all the non-zeros in these matrices are ones and because each row has only one non-zero, the data can be represented and multiplied much more efficiently than a generic sparse matrix.
  • SplitMatrix represents matrices with both dense, sparse and categorical parts, allowing for a significant speedup in matrix multiplications.
  • StandardizedMatrix efficiently and sparsely represents a matrix that has had its column normalized to have mean zero and variance one. Even if the underlying matrix is sparse, such a normalized matrix will be dense. However, by storing the scaling and shifting factors separately, StandardizedMatrix retains the original matrix sparsity.

Wide data set

Benchmarks

See here for detailed benchmarking.

API documentation

See here for detailed API documentation.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].