Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → rtosholdings → Riptable

rtosholdings / Riptable

Licence: other

64bit multithreaded python data analytics tools for numpy arrays and datasets

Programming Languages

python

139335 projects - #7 most used programming language

Labels

analytics numpy

Projects that are alternatives of or similar to Riptable

Stats Maths With Python

General statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python

Stars: ✭ 381 (+34.63%)

Mutual labels: analytics, numpy

gaia

Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.

Stars: ✭ 29 (-89.75%)

Mutual labels: analytics, numpy

Covid-19-analysis

Analysis with Covid-19 data

Stars: ✭ 49 (-82.69%)

Mutual labels: analytics, numpy

AIPortfolio

Use AI to generate a optimized stock portfolio

Stars: ✭ 28 (-90.11%)

Mutual labels: numpy

clevertap-react-native

CleverTap React Native SDK

Stars: ✭ 40 (-85.87%)

Mutual labels: analytics

Divolte Collector

Stars: ✭ 264 (-6.71%)

Mutual labels: analytics

Pybind11 examples

Examples for the usage of "pybind11"

Stars: ✭ 280 (-1.06%)

Mutual labels: numpy

Visitor-Parser-JS

Visitor Parser JS

Stars: ✭ 20 (-92.93%)

Mutual labels: analytics

Trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Stars: ✭ 4,581 (+1518.73%)

Mutual labels: analytics

Roapi

Create full-fledged APIs for static datasets without writing a single line of code.

Stars: ✭ 253 (-10.6%)

Mutual labels: analytics

Pirsch

Pirsch is a drop-in, server-side, no-cookie, and privacy-focused analytics solution for Go.

Stars: ✭ 257 (-9.19%)

Mutual labels: analytics

Sorting-Visualizer

A python based sorting visualizer built with the help of Matplotlib animations

Stars: ✭ 15 (-94.7%)

Mutual labels: numpy

Laravel Gamp

📊 Laravel Google Analytics Measurement Protocol Package

Stars: ✭ 271 (-4.24%)

Mutual labels: analytics

eigenpy

Bindings between Numpy and Eigen using Boost.Python

Stars: ✭ 88 (-68.9%)

Mutual labels: numpy

Pysynth

Several simple music synthesizers in Python 3. Input from ABC or MIDI files is also supported.

Stars: ✭ 279 (-1.41%)

Mutual labels: numpy

bitmovin-go

Golang-Client which enables you to seamlessly integrate the new Bitmovin API into your existing projects

Stars: ✭ 49 (-82.69%)

Mutual labels: analytics

Introduction Datascience Python Book

Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications

Stars: ✭ 275 (-2.83%)

Mutual labels: analytics

Awesome Ecommerce Stack

💰 Popular marketing tools and add-ons used by 10,000+ of the top e-commerce stores.

Stars: ✭ 255 (-9.89%)

Mutual labels: analytics

Flex4apps

Flex4Apps main project repository

Stars: ✭ 255 (-9.89%)

Mutual labels: analytics

Incubator Age

Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.

Stars: ✭ 244 (-13.78%)

Mutual labels: analytics

View All Similar Projects ➔

RipTable

All in one, high performance 64 bit python analytics engine for numpy arrays with multithreaded support.

Support for Python 3.6, 3.7, 3.8 on 64 bit Linux, Windows, and Mac OS.

Enhances or replaces numpy, pandas, and includes high speed cross platform SDS file format. RipTable can often crunch numbers at 1.5x to 10x the speed of numpy or pandas.

Maximum speed is achieved through the use of vector instrinsics: hand rolled loops, using AVX-256 with AVX-512 support coming; parallel computing: for large arrays, multiple threads are deployed; recycling: built in array garbage collection; hashing and parallel sorts for core algorithms.

Install

pip install riptable

Documentation: readthedocs

Basic Concepts and Classes

FastArray: subclasses from a numpy array with builtin multithreaded number crunching. All scikit routines that expect a numpy array will also accept a FastArray since it is subclassed. isinstance(fastarray, np.ndarray) will return True.

Dataset: replaces the pandas DataFrame class and holds equal row length numpy arrays (including > 1 dimension).

Struct: replaces the pandas Series class. A Struct is a grab bag collection class that Dataset subclasses from.

Categorical: replaces both pandas groupby and Categorical class. RipTable Categoricals are multikey, filterable, stackable, archivable, and can chain computations such as apply_reduce loops. They can do everything groupby can plus more.

Date/Time Classes: DateTimeNano, Date, TimeSpan, and DateSpan are designed more like Java, C++, or C# classes. Replaces most numpy and pandas date time classes.

Accum2/AccumTable: For cross tabulation.

SDS: a new file format which can stack multiple datasets in multiple files with zstd compression, threads, and no extra memory copies. SDS also supports loading and writing datasets to shared memory.

Getting Started

import riptable as rt
ds = rt.Dataset({'intarray': rt.arange(1_000_000), 'floatarray': rt.arange(1_000_000.0)})
ds
ds.intarray.sum()

Numpy Users

FastArray is a numpy array, however they can be flipped back and forth with no array copies taking place (it just changes the view).

import riptable as rt
import numpy as np
a = rt.arange(100)
numpyarray = a._np
fastarray = rt.FA(numpyarray)

or directly by changing the view, note how a FastArray is a numpy array

numpyarray.view(rt.FastArray)
fastarry.view(np.ndarray)
ininstance(fastarray, np.ndarray)

Pandas Users

Simply drop a pandas DataFrame class into a riptable Dataset and it will be auto converted.

import riptable as rt
import numpy as np
import pandas as pd
df = pd.DataFrame({'intarray': np.arange(1_000_000), 'floatarray': np.arange(1_000_000.0)})
ds = rt.Dataset(df)

How can I contribute?

RipTable has been public open sourced because it needs more users and contributions to take it to the next level. The RipTable team is confident the engine is the next generation building block for python data analytics computing. We need help from reporting bugs, docs, improved functionality, and new functionality. Please consider a github pull request or email the team.

See the contributing guide for more information.

How can I trust RipTable calculations?

RipTable has been in development for 3 years and tested by dozens of quants at a large financial firm. It has a full suite of testing. However just like any project, we still disover bugs and improvements. Please report them using github issues.

How can RipTable perform the same calculations faster?

RipTable was written from day one to handle large data and mulithreading using the riptide_cpp layer for basic arithmetic functions and algorithms. Many core algorithms have been painstakingly rewritten for multithreading.

Why doesn't numpy or pandas just pick up the same code?

numpy does not have a multithreaded layer (we are in discussions with the numpy team to add such a layer), nor is it designed to use C++ templates or hashing algorithms. pandas does not have a C++ layer (it uses cython instead) and is a victim of its own success making early design mistakes difficult to change (such as the block manager and lack of powerful Categoricals).

Small, Medium, and Large array performance

RipTable is designed for all sizes of arrays. For small arrays (< 100 length), low processing overhead is important. RipTable's FastArray is written in hand coded 'C' and processes simple arithmetic functions faster than numpy arrays. For medium arrays (< 100,000 length), RipTable has vector instrinic loops. For large arrays (>= 100,000) RipTable knows how to dynamically scale out threading, waking up threads efficiently using a futex.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 283

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (61) 🔗