All Projects → TileDB-Inc → Tiledb

TileDB-Inc / Tiledb

Licence: mit
The Universal Storage Engine

Projects that are alternatives of or similar to Tiledb

Collapse
Advanced and Fast Data Transformation in R
Stars: ✭ 184 (-82.84%)
Mutual labels:  data-science, data-analysis, scientific-computing
Awesome Scientific Python
A curated list of awesome scientific Python resources
Stars: ✭ 127 (-88.15%)
Mutual labels:  data-science, data-analysis, scientific-computing
Matplotplusplus
Matplot++: A C++ Graphics Library for Data Visualization 📊🗾
Stars: ✭ 2,433 (+126.96%)
Mutual labels:  data-science, data-analysis, scientific-computing
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-94.59%)
Mutual labels:  s3, data-science, hdfs
Seaweedfs
SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Stars: ✭ 13,380 (+1148.13%)
Mutual labels:  s3, hdfs, s3-storage
Kneed
Knee point detection in Python 📈
Stars: ✭ 328 (-69.4%)
Mutual labels:  data-science, data-analysis, scientific-computing
datajoint-python
Relational data pipelines for the science lab
Stars: ✭ 140 (-86.94%)
Mutual labels:  s3, scientific-computing, data-analysis
Gop
GoPlus - The Go+ language for engineering, STEM education, and data science
Stars: ✭ 7,829 (+630.32%)
Mutual labels:  data-science, data-analysis, scientific-computing
Model Describer
model-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-97.95%)
Mutual labels:  data-science, data-analysis
Cluster Pack
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-97.85%)
Mutual labels:  s3, hdfs
Socrat
A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization
Stars: ✭ 26 (-97.57%)
Mutual labels:  data-science, data-analysis
Spring2017 proffosterprovost
Introduction to Data Science
Stars: ✭ 18 (-98.32%)
Mutual labels:  data-science, data-analysis
Football Data
football (soccer) datasets
Stars: ✭ 18 (-98.32%)
Mutual labels:  data-science, data-analysis
Resources
PyMC3 educational resources
Stars: ✭ 930 (-13.25%)
Mutual labels:  data-science, data-analysis
Skdata
Python tools for data analysis
Stars: ✭ 16 (-98.51%)
Mutual labels:  data-science, data-analysis
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (-19.4%)
Mutual labels:  data-science, data-analysis
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+676.96%)
Mutual labels:  data-science, data-analysis
Dataframe
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (-22.76%)
Mutual labels:  data-science, data-analysis
Dataflowjavasdk
Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-20.34%)
Mutual labels:  data-science, data-analysis
Art Data Science
The Art of Data Science
Stars: ✭ 32 (-97.01%)
Mutual labels:  data-science, data-analysis

TileDB logo

Azure Pipelines Anaconda download count badge

The Universal Storage Engine

TileDB is a powerful engine for storing and accessing dense and sparse multi-dimensional arrays, which can help you model any complex data efficiently. It is an embeddable C++ library that works on Linux, macOS, and Windows. It is open-sourced under the permissive MIT License, developed and maintained by TileDB, Inc. To distinguish this project from other TileDB offerings, we often refer to it as TileDB Embedded.

TileDB includes the following features:

  • Support for both dense and sparse arrays
  • Support for dataframes and key-value stores (via sparse arrays)
  • Cloud storage (AWS S3, Google Cloud Storage, Azure Blob Storage)
  • Chunked (tiled) arrays
  • Multiple compression, encryption and checksum filters
  • Fully multi-threaded implementation
  • Parallel IO
  • Data versioning (rapid updates, time traveling)
  • Array metadata
  • Array groups
  • Numerous APIs on top of the C++ library
  • Numerous integrations (Spark, Dask, MariaDB, GDAL, etc.)

You can use TileDB to store data in a variety of applications, such as Genomics, Geospatial, Finance and more. The power of TileDB stems from the fact that any data can be modeled efficiently as either a dense or a sparse multi-dimensional array, which is the format used internally by most data science tooling. By storing your data and metadata in TileDB arrays, you abstract all the data storage and management pains, while efficiently accessing the data with your favorite data science tool.

Quickstart

You can install the TileDB library as follows:

# Homebrew (macOS):
$ brew update
$ brew install tiledb-inc/stable/tiledb

# Or Conda (macOS, Linux, Windows):
$ conda install -c conda-forge tiledb

Alternatively, you can use the Docker image we provide:

$ docker pull tiledb/tiledb
$ docker run -it tiledb/tiledb

We include several examples. You can start with the following:

Documentation

You can find the detailed TileDB documentation at https://docs.tiledb.com.

Building from source

Please see building from source in the documentation.

Format Specification

The TileDB data format is open-source and can be found here.

APIs

The TileDB team maintains a variety of APIs built on top of the C++ library:

Integrations

TileDB is also integrated with several popular databases and data science tools:

Get involved

TileDB Embedded is an open-source project and welcomes all forms of contributions. Contributors to the project should read over the contribution docs for more information.

We'd love to hear from you. Drop us a line at [email protected], visit our forum or contact form, or follow us on Twitter to stay informed of updates and news.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].