All Projects → greglook → merkle-db

greglook / merkle-db

Licence: Unlicense license
High-scalability analytics database built on immutable merkle-trees

Programming Languages

clojure
4091 projects
HCL
1544 projects

Projects that are alternatives of or similar to merkle-db

pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+63.64%)
Mutual labels:  big-data, nosql
Iotdb
Apache IoTDB
Stars: ✭ 1,221 (+2675%)
Mutual labels:  big-data, nosql
Beeva Best Practices
Best Practices and Style Guides in BEEVA
Stars: ✭ 335 (+661.36%)
Mutual labels:  big-data, nosql
Helicalinsight
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Stars: ✭ 214 (+386.36%)
Mutual labels:  big-data, nosql
Zeppelin
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+12429.55%)
Mutual labels:  big-data, nosql
javaer-mind
Java 程序员进阶学习的思维导图
Stars: ✭ 66 (+50%)
Mutual labels:  big-data, nosql
ytpriv
YT metadata exporter
Stars: ✭ 28 (-36.36%)
Mutual labels:  big-data
shelfdb
A tiny documents database for Python
Stars: ✭ 35 (-20.45%)
Mutual labels:  nosql
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+186.36%)
Mutual labels:  big-data
scikit-learn-intelex
Intel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Stars: ✭ 887 (+1915.91%)
Mutual labels:  big-data
elearning
elearning linux/mac/db/cache/server/tools/人工智能
Stars: ✭ 72 (+63.64%)
Mutual labels:  nosql
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-11.36%)
Mutual labels:  big-data
MochaDB
A .NET ACID RDBMS and NoSQL(with mods/tools) database.
Stars: ✭ 19 (-56.82%)
Mutual labels:  nosql
mmtf-spark
Methods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Stars: ✭ 20 (-54.55%)
Mutual labels:  big-data
phoenix-queryserver
Apache Phoenix Query Server
Stars: ✭ 33 (-25%)
Mutual labels:  big-data
docs
Source code of the ArangoDB online documentation
Stars: ✭ 18 (-59.09%)
Mutual labels:  nosql
leetspeek
Open and collaborative content from leet hackers!
Stars: ✭ 11 (-75%)
Mutual labels:  big-data
community.mongodb
MongoDB Ansible Collection
Stars: ✭ 75 (+70.45%)
Mutual labels:  nosql
couchdb-pkg
Apache CouchDB Packaging support files
Stars: ✭ 24 (-45.45%)
Mutual labels:  big-data
awesome-coder-resources
编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (+22.73%)
Mutual labels:  big-data

MerkleDB

CircleCI codecov core docs spark docs tools docs

MerkleDB is a Clojure library for storing and accessing large data sets in a hybrid column-oriented tree of content-adressable data blocks.

This project is usable, but should be considered alpha quality. For more details, see the design doc, proposed client interface, and sample usage patterns.

Installation

Library releases are published on Clojars. To use the latest version with Leiningen, add the following dependency to your project definition:

Clojars Project

This will pull in the omnibus package, which in turn depends on each subproject of the same version. You may instead depend on the subprojects directly if you wish to omit some functionality, such as Spark integration.

Concepts

The high-level semantics of this library are similar to a traditional key-value data store:

  • A database is a collection of tables, along with some user metadata.
  • Tables are collections of records, which are identified uniquely within the table by an id key.
  • Each record is an associative collection of fields, mapping field names to values.
  • Values may have any type that the underlying serialization format supports. There is no guarantee that all the values for a given field have the same type.

Goals

The primary design goals of MerkleDB are:

  • Flexible schema-free key-value storage.
  • High-parallelism reads and writes to optimize for bulk-processing, where a job computes over most or all of the records in the table, but possibly only needs access to a subset of the fields in each record.

Secondary goals include:

  • Efficient storage utilization via deduplication and structural sharing.
  • Light-weight versioning and copy-on-write to support immutable reads.
  • Building on storage and synchronization abstractions to support hosted service backends.

Non-goals:

  • High-frequency, highly concurrent writes. Initial versions will have simple database-wide locking for updates.
  • Access control. In this library, all authentication and authorization is deferred to the storage layers backing the block store and ref manager.

License

This is free and unencumbered software released into the public domain. See the UNLICENSE file for more information.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].