All Projects → thrill → Thrill

thrill / Thrill

Licence: other
Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

Projects that are alternatives of or similar to Thrill

Nakedtensor
Bare bone examples of machine learning in TensorFlow
Stars: ✭ 2,443 (+362.69%)
Mutual labels:  big-data, distributed-computing
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-71.21%)
Mutual labels:  big-data, distributed-computing
Moosefs
MooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+94.13%)
Mutual labels:  big-data, distributed-computing
nebula
A distributed block-based data storage and compute engine
Stars: ✭ 127 (-75.95%)
Mutual labels:  big-data, distributed-computing
Hazelcast
Open-source distributed computation and storage platform
Stars: ✭ 4,662 (+782.95%)
Mutual labels:  big-data, distributed-computing
Selinon
An advanced distributed task flow management on top of Celery
Stars: ✭ 237 (-55.11%)
Mutual labels:  big-data, distributed-computing
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-71.59%)
Mutual labels:  big-data, distributed-computing
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-92.61%)
Mutual labels:  big-data, distributed-computing
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-86.36%)
Mutual labels:  big-data, distributed-computing
Metorikku
A simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-31.63%)
Mutual labels:  big-data, distributed-computing
Circosjs
d3 library to build circular graphs
Stars: ✭ 436 (-17.42%)
Mutual labels:  big-data
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+4075.76%)
Mutual labels:  big-data
Pgm Index
🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-5.49%)
Mutual labels:  big-data
Awesome Distributed Systems
Awesome list of distributed systems resources
Stars: ✭ 512 (-3.03%)
Mutual labels:  distributed-computing
Cortx
CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-19.32%)
Mutual labels:  big-data
Stream Framework
Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+766.67%)
Mutual labels:  big-data
Listenbrainz Server
Server for the ListenBrainz project
Stars: ✭ 420 (-20.45%)
Mutual labels:  big-data
Datascience Ai Machinelearning Resources
Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-21.59%)
Mutual labels:  big-data
Opendata.cern.ch
Source code for the CERN Open Data portal
Stars: ✭ 411 (-22.16%)
Mutual labels:  big-data
Arkime
Arkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (+845.83%)
Mutual labels:  big-data

Thrill

Travis-CI Status: Travis-CI Status
Jenkins Status: Jenkins Status
Appveyor Status: Appveyor Status

Thrill is an EXPERIMENTAL C++ framework for algorithmic distributed Big Data batch computations on a cluster of machines. It is currently being designed and developed as a research project at Karlsruhe Institute of Technology and is in early testing. More information on goals and mission see http://project-thrill.org.

For easy steps on Getting Started refer to the Live Documentation.

License

Thrill is free software provided under BSD 2-clause license.

If you use Thrill in an academic context or publication, please cite our paper

@InProceedings{bingmann2016thrill,
  author =       {Timo Bingmann and Michael Axtmann and Emanuel J{\"{o}}bstl and Sebastian Lamm and Huyen Chau Nguyen and Alexander Noe and Sebastian Schlag and Matthias Stumpp and Tobias Sturm and Peter Sanders},
  title =        {{Thrill}: High-Performance Algorithmic Distributed Batch Data Processing with {C++}},
  booktitle =    {IEEE International Conference on Big Data},
  year =         2016,
  pages =        {172--183},
  month =        dec,
  organization = {IEEE},
  note =         {preprint arXiv:1608.05634},
  isbn =         {978-1-4673-9005-7},
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].