All Projects β†’ vicolab β†’ Ml Pyxis

vicolab / Ml Pyxis

Licence: mit
Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Ml Pyxis

Data Science Resources
πŸ‘¨πŸ½β€πŸ«You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?πŸ”‹
Stars: ✭ 171 (+83.87%)
Mutual labels:  data-science, dataset, data
Dbg Pds
Deutsche Boerse's Financial Trading Public Data Set
Stars: ✭ 124 (+33.33%)
Mutual labels:  data-science, dataset, data
Datascience course
Curso de Data Science em PortuguΓͺs
Stars: ✭ 294 (+216.13%)
Mutual labels:  data-science, dataset, data
Coffee Quality Database
Building the Coffee Quality Institute Database
Stars: ✭ 141 (+51.61%)
Mutual labels:  data-science, dataset, data
Retriever
Quickly download, clean up, and install public datasets into a database management system
Stars: ✭ 241 (+159.14%)
Mutual labels:  data-science, dataset, data
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+193.55%)
Mutual labels:  data-science, dataset, data
Awesome Twitter Data
A list of Twitter datasets and related resources.
Stars: ✭ 533 (+473.12%)
Mutual labels:  data-science, dataset, data
Data Polygamy
Data Polygamy is a topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets.
Stars: ✭ 39 (-58.06%)
Mutual labels:  data-science, data
Qri
you're invited to a data party!
Stars: ✭ 1,003 (+978.49%)
Mutual labels:  data-science, dataset
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (+1056.99%)
Mutual labels:  data-science, data
Legislator
Interface to the Comparative Legislators Database
Stars: ✭ 62 (-33.33%)
Mutual labels:  dataset, data
Dataconfs
A list of conferences connected with data worldwide.
Stars: ✭ 36 (-61.29%)
Mutual labels:  data-science, dataset
Football Data
football (soccer) datasets
Stars: ✭ 18 (-80.65%)
Mutual labels:  data-science, dataset
Php Ml
PHP-ML - Machine Learning library for PHP
Stars: ✭ 7,900 (+8394.62%)
Mutual labels:  data-science, dataset
Skdata
Python tools for data analysis
Stars: ✭ 16 (-82.8%)
Mutual labels:  data-science, data
Datacomparer
dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
Stars: ✭ 58 (-37.63%)
Mutual labels:  data-science, data
Datastream.io
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
Stars: ✭ 814 (+775.27%)
Mutual labels:  data-science, dataset
Openrefine
OpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+9073.12%)
Mutual labels:  data-science, data
Colour
Colour Science for Python
Stars: ✭ 1,131 (+1116.13%)
Mutual labels:  dataset, data
Magicbox
A platform that uses real-time data to inform life-saving humanitarian responses to emergency situations
Stars: ✭ 73 (-21.51%)
Mutual labels:  data-science, data

.. image:: https://img.shields.io/badge/license-MIT-blue.svg :target: https://github.com/vicolab/ml-pyxis/blob/master/LICENSE

======== ml-pyxis

Tool for reading and writing datasets of tensors (numpy.ndarray) with MessagePack and Lightning Memory-Mapped Database (LMDB).

Example

.. code-block:: python

import numpy as np import pyxis as px

Create data

nb_samples = 10 X = np.ones((nb_samples, 2, 2), dtype=np.float32) y = np.arange(nb_samples, dtype=np.uint8)

Write

db = px.Writer(dirpath='data', map_size_limit=1) db.put_samples('input', X, 'target', y) db.close()

Read

db = px.Reader(dirpath='data') sample = db[0] db.close()

print(sample)

.. code-block:: python

{'input': array([[ 1., 1.], [ 1., 1.]], dtype=float32), 'target': array(0, dtype=uint8)}

More examples can be found in the examples/ directory.

Installation

The installation instructions are generic and should work on most operating systems that support the prerequisites.

ml-pyxis requires Python version 2.7, 3.4, 3.5, or 3.6. We recommend installing ml-pyxis, as well as all prerequisites, in a virtual environment via virtualenv_.


Prerequisites

The following Python packages are required to use ml-pyxis:

  • lmdb_ - Universal Python binding for the LMDB 'Lightning' Database_
  • msgpack_ - MessagePack_ implementation for Python (binary serialisation)
  • NumPy_ - N-dimensional array object and tools for operating on them
  • six_ - A Python 2 and 3 compatibility library

Please refer to the individual packages for more information about additional dependencies and how to install them for your operating system.


Bleeding-edge installation

To install the latest version of ml-pyxis, use the following command:

.. code-block:: bash

pip install --upgrade https://github.com/vicolab/ml-pyxis/archive/master.zip

Add the --user tag if you want to install the package in your home directory.

Notice

The previous LMDB-only API has been deprecated in favour of a combination between LMDB and msgpack. The old version can be installed by using the following commit hash with pip:

.. code-block:: bash

pip install --upgrade git+git://github.com/vicolab/[email protected]


Development installation

ml-pyxis can be installed from source in such a way that any changes to your local copy will take effect without having to reinstall the package. Start by making a copy of the repository:

.. code-block:: bash

git clone https://github.com/vicolab/ml-pyxis.git

Next, enter the directory and install ml-pyxis in development mode by issuing the following command:

.. code-block:: bash

cd ml-pyxis python setup.py develop

.. Links

.. _virtualenv: https://virtualenv.pypa.io/en/stable/ .. _lmdb: http://lmdb.readthedocs.io/en/release/ .. _LMDB 'Lightning' Database: https://symas.com/products/lightning-memory-mapped-database/ .. _msgpack: https://github.com/msgpack/msgpack-python .. _MessagePack: http://msgpack.org/ .. _NumPy: http://www.numpy.org/ .. _six: https://github.com/benjaminp/six

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].