All Projects → ranaroussi → Pystore

ranaroussi / Pystore

Licence: apache-2.0
Fast data store for Pandas time-series data

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Pystore

Arctic
High performance datastore for time series and tick data
Stars: ✭ 2,525 (+676.92%)
Mutual labels:  pandas, database, timeseries
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+836.62%)
Mutual labels:  dataframe, pandas
Pandasvault
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).
Stars: ✭ 316 (-2.77%)
Mutual labels:  dataframe, pandas
pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (+198.46%)
Mutual labels:  pandas, dataframe
Mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Stars: ✭ 2,308 (+610.15%)
Mutual labels:  dataframe, pandas
Pandasgui
PandasGUI is a GUI for viewing, plotting and analyzing Pandas DataFrames.
Stars: ✭ 2,495 (+667.69%)
Mutual labels:  dataframe, pandas
tempo
API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
Stars: ✭ 212 (-34.77%)
Mutual labels:  timeseries, pandas
Pandahouse
Pandas interface for Clickhouse database
Stars: ✭ 126 (-61.23%)
Mutual labels:  dataframe, pandas
saddle
SADDLE: Scala Data Library
Stars: ✭ 23 (-92.92%)
Mutual labels:  pandas, dataframe
cognipy
In-memory Graph Database and Knowledge Graph with Natural Language Interface, compatible with Pandas
Stars: ✭ 31 (-90.46%)
Mutual labels:  pandas, dataframe
tableau-scraping
Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
Stars: ✭ 91 (-72%)
Mutual labels:  pandas, dataframe
Ditching Excel For Python
Functionalities in Excel translated to Python
Stars: ✭ 172 (-47.08%)
Mutual labels:  dataframe, pandas
Panthera
Data-frames & arrays on Clojure
Stars: ✭ 168 (-48.31%)
Mutual labels:  dataframe, pandas
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (-27.69%)
Mutual labels:  dataframe, pandas
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-53.85%)
Mutual labels:  dataframe, database
Styleframe
A library that wraps pandas and openpyxl and allows easy styling of dataframes in excel
Stars: ✭ 252 (-22.46%)
Mutual labels:  dataframe, pandas
Dominando-Pandas
Este repositório está destinado ao processo de aprendizagem da biblioteca Pandas.
Stars: ✭ 22 (-93.23%)
Mutual labels:  pandas, dataframe
Jardin
A pandas.DataFrame-based ORM.
Stars: ✭ 81 (-75.08%)
Mutual labels:  dataframe, pandas
Danfojs
danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Stars: ✭ 1,304 (+301.23%)
Mutual labels:  dataframe, pandas
hamilton
A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+88.31%)
Mutual labels:  pandas, dataframe

PyStore - Fast data store for Pandas timeseries data

.. image:: https://img.shields.io/badge/python-2.7,%203.5+-blue.svg?style=flat :target: https://pypi.python.org/pypi/pystore :alt: Python version

.. image:: https://img.shields.io/pypi/v/pystore.svg?maxAge=60 :target: https://pypi.python.org/pypi/pystore :alt: PyPi version

.. image:: https://img.shields.io/pypi/status/pystore.svg?maxAge=60 :target: https://pypi.python.org/pypi/pystore :alt: PyPi status

.. image:: https://img.shields.io/travis/ranaroussi/pystore/master.svg?maxAge=1 :target: https://travis-ci.com/ranaroussi/pystore :alt: Travis-CI build status

.. image:: https://www.codefactor.io/repository/github/ranaroussi/pystore/badge :target: https://www.codefactor.io/repository/github/ranaroussi/pystore :alt: CodeFactor

.. image:: https://img.shields.io/github/stars/ranaroussi/pystore.svg?style=social&label=Star&maxAge=60 :target: https://github.com/ranaroussi/pystore :alt: Star this repo

.. image:: https://img.shields.io/twitter/follow/aroussi.svg?style=social&label=Follow&maxAge=60 :target: https://twitter.com/aroussi :alt: Follow me on twitter

\

PyStore <https://github.com/ranaroussi/pystore>_ is a simple (yet powerful) datastore for Pandas dataframes, and while it can store any Pandas object, it was designed with storing timeseries data in mind.

It's built on top of Pandas <http://pandas.pydata.org>, Numpy <http://numpy.pydata.org>, Dask <http://dask.pydata.org>, and Parquet <http://parquet.apache.org> (via Fastparquet <https://github.com/dask/fastparquet>_), to provide an easy to use datastore for Python developers that can easily query millions of rows per second per client.

==> Check out this Blog post <https://medium.com/@aroussi/fast-data-store-for-pandas-time-series-data-using-pystore-89d9caeef4e2>_ for the reasoning and philosophy behind PyStore, as well as a detailed tutorial with code examples.

==> Follow this PyStore tutorial <https://github.com/ranaroussi/pystore/blob/master/examples/pystore-tutorial.ipynb>_ in Jupyter notebook format.

Quickstart

Install PyStore

Install using pip:

.. code:: bash

$ pip install pystore --upgrade --no-cache-dir

Install using conda:

.. code:: bash

$ conda install -c ranaroussi pystore

INSTALLATION NOTE: If you don't have Snappy installed (compression/decompression library), you'll need to you'll need to install it first <https://github.com/ranaroussi/pystore#dependencies>_.

Using PyStore

.. code:: python

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pystore
import quandl

# Set storage path (optional)
# Defaults to `~/pystore` or `PYSTORE_PATH` environment variable (if set)
pystore.set_path("~/pystore")

# List stores
pystore.list_stores()

# Connect to datastore (create it if not exist)
store = pystore.store('mydatastore')

# List existing collections
store.list_collections()

# Access a collection (create it if not exist)
collection = store.collection('NASDAQ')

# List items in collection
collection.list_items()

# Load some data from Quandl
aapl = quandl.get("WIKI/AAPL", authtoken="your token here")

# Store the first 100 rows of the data in the collection under "AAPL"
collection.write('AAPL', aapl[:100], metadata={'source': 'Quandl'})

# Reading the item's data
item = collection.item('AAPL')
data = item.data  # <-- Dask dataframe (see dask.pydata.org)
metadata = item.metadata
df = item.to_pandas()

# Append the rest of the rows to the "AAPL" item
collection.append('AAPL', aapl[100:])

# Reading the item's data
item = collection.item('AAPL')
data = item.data
metadata = item.metadata
df = item.to_pandas()


# --- Query functionality ---

# Query avaialable symbols based on metadata
collection.list_items(some_key='some_value', other_key='other_value')


# --- Snapshot functionality ---

# Snapshot a collection
# (Point-in-time named reference for all current symbols in a collection)
collection.create_snapshot('snapshot_name')

# List available snapshots
collection.list_snapshots()

# Get a version of a symbol given a snapshot name
collection.item('AAPL', snapshot='snapshot_name')

# Delete a collection snapshot
collection.delete_snapshot('snapshot_name')


# ...


# Delete the item from the current version
collection.delete_item('AAPL')

# Delete the collection
store.delete_collection('NASDAQ')

Using Dask schedulers

PyStore 0.1.18+ supports using Dask distributed.

To use a local Dask scheduler, add this to your code:

.. code:: python

from dask.distributed import LocalCluster
pystore.set_client(LocalCluster())

To use a distributed Dask scheduler, add this to your code:

.. code:: python

pystore.set_client("tcp://xxx.xxx.xxx.xxx:xxxx")
pystore.set_path("/path/to/shared/volume/all/workers/can/access")

Concepts

PyStore provides namespaced collections of data. These collections allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.). Each collection (or namespace) maps to a directory containing partitioned parquet files for each item (e.g. symbol).

A good practice it to create collections that may look something like this:

  • collection.EOD
  • collection.ONEMINUTE

Requirements

  • Python 2.7 or Python > 3.5
  • Pandas
  • Numpy
  • Dask
  • Fastparquet
  • Snappy <http://google.github.io/snappy/>_ (Google's compression/decompression library)
  • multitasking

PyStore was tested to work on *nix-like systems, including macOS.

Dependencies:

PyStore uses Snappy <http://google.github.io/snappy/>_, a fast and efficient compression/decompression library from Google. You'll need to install Snappy on your system before installing PyStore.

* See the python-snappy Github repo <https://github.com/andrix/python-snappy#dependencies>_ for more information.

*nix Systems:

  • APT: sudo apt-get install libsnappy-dev
  • RPM: sudo yum install libsnappy-devel

macOS:

First, install Snappy's C library using Homebrew <https://brew.sh>_:

.. code::

$ brew install snappy

Then, install Python's snappy using conda:

.. code::

$ conda install python-snappy -c conda-forge

...or, using pip:

.. code::

$ CPPFLAGS="-I/usr/local/include -L/usr/local/lib" pip install python-snappy

Windows:

Windows users should checkout Snappy for Windows <https://snappy.machinezoo.com>_ and this Stackoverflow post <https://stackoverflow.com/a/43756412/1783569>_ for help on installing Snappy and python-snappy.

Roadmap

PyStore currently offers support for local filesystem (including attached network drives). I plan on adding support for Amazon S3 (via s3fs <http://s3fs.readthedocs.io/>), Google Cloud Storage (via gcsfs <https://github.com/dask/gcsfs/>) and Hadoop Distributed File System (via hdfs3 <http://hdfs3.readthedocs.io/>_) in the future.

Acknowledgements

PyStore is hugely inspired by Man AHL <http://www.ahl.com/>'s Arctic <https://github.com/manahl/arctic> which uses MongoDB for storage and allow for versioning and other features. I highly reommend you check it out.

License

PyStore is licensed under the Apache License, Version 2.0. A copy of which is included in LICENSE.txt.


I'm very interested in your experience with PyStore. Please drop me an note with any feedback you have.

Contributions welcome!

- Ran Aroussi

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].