All Projects → rtosholdings → Riptable

rtosholdings / Riptable

Licence: other
64bit multithreaded python data analytics tools for numpy arrays and datasets

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Riptable

Stats Maths With Python
General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (+34.63%)
Mutual labels:  analytics, numpy
gaia
Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.
Stars: ✭ 29 (-89.75%)
Mutual labels:  analytics, numpy
Covid-19-analysis
Analysis with Covid-19 data
Stars: ✭ 49 (-82.69%)
Mutual labels:  analytics, numpy
AIPortfolio
Use AI to generate a optimized stock portfolio
Stars: ✭ 28 (-90.11%)
Mutual labels:  numpy
clevertap-react-native
CleverTap React Native SDK
Stars: ✭ 40 (-85.87%)
Mutual labels:  analytics
Divolte Collector
Divolte Collector
Stars: ✭ 264 (-6.71%)
Mutual labels:  analytics
Pybind11 examples
Examples for the usage of "pybind11"
Stars: ✭ 280 (-1.06%)
Mutual labels:  numpy
Visitor-Parser-JS
Visitor Parser JS
Stars: ✭ 20 (-92.93%)
Mutual labels:  analytics
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+1518.73%)
Mutual labels:  analytics
Roapi
Create full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (-10.6%)
Mutual labels:  analytics
Pirsch
Pirsch is a drop-in, server-side, no-cookie, and privacy-focused analytics solution for Go.
Stars: ✭ 257 (-9.19%)
Mutual labels:  analytics
Sorting-Visualizer
A python based sorting visualizer built with the help of Matplotlib animations
Stars: ✭ 15 (-94.7%)
Mutual labels:  numpy
Laravel Gamp
📊 Laravel Google Analytics Measurement Protocol Package
Stars: ✭ 271 (-4.24%)
Mutual labels:  analytics
eigenpy
Bindings between Numpy and Eigen using Boost.Python
Stars: ✭ 88 (-68.9%)
Mutual labels:  numpy
Pysynth
Several simple music synthesizers in Python 3. Input from ABC or MIDI files is also supported.
Stars: ✭ 279 (-1.41%)
Mutual labels:  numpy
bitmovin-go
Golang-Client which enables you to seamlessly integrate the new Bitmovin API into your existing projects
Stars: ✭ 49 (-82.69%)
Mutual labels:  analytics
Introduction Datascience Python Book
Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications
Stars: ✭ 275 (-2.83%)
Mutual labels:  analytics
Awesome Ecommerce Stack
💰 Popular marketing tools and add-ons used by 10,000+ of the top e-commerce stores.
Stars: ✭ 255 (-9.89%)
Mutual labels:  analytics
Flex4apps
Flex4Apps main project repository
Stars: ✭ 255 (-9.89%)
Mutual labels:  analytics
Incubator Age
Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.
Stars: ✭ 244 (-13.78%)
Mutual labels:  analytics

RipTable

All in one, high performance 64 bit python analytics engine for numpy arrays with multithreaded support.

Support for Python 3.6, 3.7, 3.8 on 64 bit Linux, Windows, and Mac OS.

Enhances or replaces numpy, pandas, and includes high speed cross platform SDS file format. RipTable can often crunch numbers at 1.5x to 10x the speed of numpy or pandas.

Maximum speed is achieved through the use of vector instrinsics: hand rolled loops, using AVX-256 with AVX-512 support coming; parallel computing: for large arrays, multiple threads are deployed; recycling: built in array garbage collection; hashing and parallel sorts for core algorithms.

Install

pip install riptable

Documentation: readthedocs

Basic Concepts and Classes

FastArray: subclasses from a numpy array with builtin multithreaded number crunching. All scikit routines that expect a numpy array will also accept a FastArray since it is subclassed. isinstance(fastarray, np.ndarray) will return True.

Dataset: replaces the pandas DataFrame class and holds equal row length numpy arrays (including > 1 dimension).

Struct: replaces the pandas Series class. A Struct is a grab bag collection class that Dataset subclasses from.

Categorical: replaces both pandas groupby and Categorical class. RipTable Categoricals are multikey, filterable, stackable, archivable, and can chain computations such as apply_reduce loops. They can do everything groupby can plus more.

Date/Time Classes: DateTimeNano, Date, TimeSpan, and DateSpan are designed more like Java, C++, or C# classes. Replaces most numpy and pandas date time classes.

Accum2/AccumTable: For cross tabulation.

SDS: a new file format which can stack multiple datasets in multiple files with zstd compression, threads, and no extra memory copies. SDS also supports loading and writing datasets to shared memory.

Getting Started

import riptable as rt
ds = rt.Dataset({'intarray': rt.arange(1_000_000), 'floatarray': rt.arange(1_000_000.0)})
ds
ds.intarray.sum()

Numpy Users

FastArray is a numpy array, however they can be flipped back and forth with no array copies taking place (it just changes the view).

import riptable as rt
import numpy as np
a = rt.arange(100)
numpyarray = a._np
fastarray = rt.FA(numpyarray)

or directly by changing the view, note how a FastArray is a numpy array

numpyarray.view(rt.FastArray)
fastarry.view(np.ndarray)
ininstance(fastarray, np.ndarray)

Pandas Users

Simply drop a pandas DataFrame class into a riptable Dataset and it will be auto converted.

import riptable as rt
import numpy as np
import pandas as pd
df = pd.DataFrame({'intarray': np.arange(1_000_000), 'floatarray': np.arange(1_000_000.0)})
ds = rt.Dataset(df)

How can I contribute?

RipTable has been public open sourced because it needs more users and contributions to take it to the next level. The RipTable team is confident the engine is the next generation building block for python data analytics computing. We need help from reporting bugs, docs, improved functionality, and new functionality. Please consider a github pull request or email the team.

See the contributing guide for more information.

How can I trust RipTable calculations?

RipTable has been in development for 3 years and tested by dozens of quants at a large financial firm. It has a full suite of testing. However just like any project, we still disover bugs and improvements. Please report them using github issues.

How can RipTable perform the same calculations faster?

RipTable was written from day one to handle large data and mulithreading using the riptide_cpp layer for basic arithmetic functions and algorithms. Many core algorithms have been painstakingly rewritten for multithreading.

Why doesn't numpy or pandas just pick up the same code?

numpy does not have a multithreaded layer (we are in discussions with the numpy team to add such a layer), nor is it designed to use C++ templates or hashing algorithms. pandas does not have a C++ layer (it uses cython instead) and is a victim of its own success making early design mistakes difficult to change (such as the block manager and lack of powerful Categoricals).

Small, Medium, and Large array performance

RipTable is designed for all sizes of arrays. For small arrays (< 100 length), low processing overhead is important. RipTable's FastArray is written in hand coded 'C' and processes simple arithmetic functions faster than numpy arrays. For medium arrays (< 100,000 length), RipTable has vector instrinic loops. For large arrays (>= 100,000) RipTable knows how to dynamically scale out threading, waking up threads efficiently using a futex.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].