All Projects → data-apis → array-api-comparison

data-apis / array-api-comparison

Licence: MIT License
Data and tooling to compare the API surfaces of various array libraries.

Programming Languages

Jupyter Notebook
11667 projects
javascript
184084 projects - #8 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to array-api-comparison

Tiledb Py
Python interface to the TileDB storage manager
Stars: ✭ 78 (+69.57%)
Mutual labels:  numpy, array
alvito
Alvito - An Algorithm Visualization Tool for Python
Stars: ✭ 52 (+13.04%)
Mutual labels:  numpy, array
Panthera
Data-frames & arrays on Clojure
Stars: ✭ 168 (+265.22%)
Mutual labels:  numpy, array
Dask
Parallel computing with task scheduling
Stars: ✭ 9,309 (+20136.96%)
Mutual labels:  numpy, pydata
Static Frame
Immutable and grow-only Pandas-like DataFrames with a more explicit and consistent interface.
Stars: ✭ 217 (+371.74%)
Mutual labels:  numpy, array
SNAP
Easy data format saving and loading for GameMaker Studio 2.3.2
Stars: ✭ 49 (+6.52%)
Mutual labels:  array
sklearn-predict
机器学习数据,预测趋势并画图
Stars: ✭ 16 (-65.22%)
Mutual labels:  numpy
array-diff-multidimensional
Compare the difference between two multidimensional arrays in PHP
Stars: ✭ 60 (+30.43%)
Mutual labels:  array
ampscan
ampscan is an open-source Python package for analysis and visualisation of digitised surface scan data, specifically for applications within Prosthetics and Orthotics (P&O), with an aim to improve evidence-based clinical practice towards improved patient outcomes.
Stars: ✭ 14 (-69.57%)
Mutual labels:  numpy
dipiper
基于nodejs的股票数据爬虫
Stars: ✭ 83 (+80.43%)
Mutual labels:  numpy
addlint
An example linter written with go/analysis for tutorial purposes
Stars: ✭ 49 (+6.52%)
Mutual labels:  tooling
CS231n
My solutions for Assignments of CS231n: Convolutional Neural Networks for Visual Recognition
Stars: ✭ 30 (-34.78%)
Mutual labels:  numpy
Python-camp
No description or website provided.
Stars: ✭ 34 (-26.09%)
Mutual labels:  numpy
python-machine-learning-book-2nd-edition
<머신러닝 교과서 with 파이썬, 사이킷런, 텐서플로>의 코드 저장소
Stars: ✭ 60 (+30.43%)
Mutual labels:  numpy
envctl
Tool for managing local dev environments
Stars: ✭ 24 (-47.83%)
Mutual labels:  tooling
tooling-talks
A monthly series of talks about tooling.
Stars: ✭ 31 (-32.61%)
Mutual labels:  tooling
numpythia
The interface between PYTHIA and NumPy
Stars: ✭ 33 (-28.26%)
Mutual labels:  numpy
Covid-19-analysis
Analysis with Covid-19 data
Stars: ✭ 49 (+6.52%)
Mutual labels:  numpy
DataSciPy
Data Science with Python
Stars: ✭ 15 (-67.39%)
Mutual labels:  numpy
clumpy
create or transform numpy images from the command line
Stars: ✭ 38 (-17.39%)
Mutual labels:  numpy

Array API Comparison

Data and tooling to compare the API surfaces of various array libraries.

Overview

The goal of this repository is to compare the public API surfaces of various PyData array libraries in order to better understand existing practice. In analyzing both the commonalities and differences across array libraries, we can derive a common API subset which can be standardized and used to ensure consistency (naming and otherwise) across array libraries. This API subset should include attribute names, method names, and positional and keyword arguments.

By deriving a common API subset, we can reduce friction among library consumers by reducing the cognitive overhead of learning array dialects. This is exemplified by the following user story:

As an array library author, I know that, regardless of the input array, whether NumPy, Dask, PyTorch, etc, the array has a method to compute the transpose which is guaranteed to have options x, y, and z.

Currently, the needs of the library author in the above user story are not met, as libraries vary in their naming conventions and the optional arguments they support.

Through specification and array library compliance, we facilitate array interoperability for both users and library developers.


Array Libraries

Currently, the following array libraries are evaluated:

  • NumPy: serves as the reference API against which all other array libraries are compared.
  • CuPy
  • Dask.array
  • JAX
  • MXNet
  • PyTorch
  • rnumpy: an opinionated curation of NumPy APIs, serving as an exercise in evaluating what is most "essential" (i.e., the smallest set of building block functionality on which most array functionality can be built).
  • PyData/Sparse
  • TensorFlow

Installation

Navigate to the directory into which you want to clone this repository

$ cd ./repository/destination/directory

Next, clone the repository

$ git clone https://github.com/data-apis/array-api-comparison.git

Once cloned, navigate to the repository directory

$ cd ./array-api-comparison

Create an Anaconda environment

$ conda create -n array-api-comparison -c conda-forge python=3.8 nodejs jupyterlab

To activate the environment,

$ conda activate array-api-comparison

Run the installation sequence

$ make

Usage

Usage: make <cmd>

  make help                              Print this message.
  
  make view-docs                         View all array API tables.

  make view-join                         View cross-library array API data.

  make view-intersection                 View the intersection of array library 
                                         APIs.

  make view-intersection-ranks           View a table ranking the intersection
                                         of array library APIs.

  make view-common-apis                  View relatively common array library
                                         APIs.

  make view-common-apis-ranks            View a table ranking relatively common
                                         array library APIs.

  make view-complement                   View array library APIs which are not
                                         in the intersection.

  make view-common-complement            View array library APIs which are not
                                         among the list of relatively common
                                         APIs.

  make view-lib-top-k-common             View a table displaying the top `K`
                                         (relatively) common array library APIs
                                         across various libraries.

  make view-lib-top-k-complement         View a table displaying the top K array
                                         library APIs in the complement across
                                         various libraries.
                                         
  make view-lib-top-k-common-complement  View a table displaying the top `K`
                                         array library APIs in the complement of
                                         the list of (relatively) common APIs
                                         across various libraries.

To run the Jupyter notebooks, run

$ jupyter lab

Organization

This repository contains the following directories:

  • data: array API data (e.g., array library APIs and their NumPy equivalents).
  • docs: browser-based documentation for viewing array API data.
  • etc: configuration files.
  • notebooks: Jupyter notebooks for analysis.
  • scripts: scripts for data manipulation and documentation generation.
  • tools: project tooling.

The data directory contains the following directories

  • raw: raw array library API data.
  • joins: array library APIs matched to their NumPy equivalents.
  • vendor: datasets acquired from third party sources, such as those found in the Python API Record repository.

The raw data directory contains the following datasets:

  • XXXXX.(csv|json): raw array library API data.

The joins data directory contains the following datasets:

  • XXXXX_numpy.(csv|json): array library APIs and their NumPy equivalents.

Lastly, the root data directory contains the following additional datasets:

  • join.(csv|json): array library API data combined in a single file.
  • intersection.(csv|json): array library API intersection.
  • common_apis.(csv|json): array library APIs which are (relatively) common across downstream libraries (>67%).
  • complement.(csv|json): array library APIs which are not in the intersection.
  • intersection_ranks.(csv|json): array library APIs which are in the intersection ranked according to relative usage in downstream libraries.
  • common_apis_ranks.(csv|json): array library APIs which are in the list of (relatively) common APIs ranked according to relative usage in downstream libraries.
  • lib_top_k_common.(csv|json): the top K array library API names in the list of relatively common APIs per downstream library according to relative usage.
  • lib_top_k_common_complement.(csv|json): the top K array library API names not in the list of relatively common APIs per downstream library according to relative usage.
  • lib_top_k_complement.(csv|json): the top K array library API names not in the list of API intersection per downstream library according to relative usage.
  • lib_top_100_category_stats.(csv|json): categorization statistics for the top 100 NumPy APIs which are consumed for each downstream library.

Note: the datasets in the root data directory are generated.

When editing data files, consider the JSON data to be the source of truth. CSV files are generated from the JSON data.


Contributing

To contribute array API data to this repository, add an data/joins/XXXXX_numpy.json file, where XXXXX is the lowercase name of the relevant array library (e.g., cupy). The JSON file should include a JSON array, where each array element has the following fields:

  • name: array library API name.
  • numpy: NumPy API equivalent.

For example,

[
    {
        "name": "all",
        "numpy": "numpy.all"
    },
    {
        "name": "allclose",
        "numpy": "numpy.allclose"
    },
    ...
]

Once added, the CSV variant can be generated using internal tooling.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].