🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

Stars: ✭ 499 (-38.92%)

Mutual labels: big-data

Attic Apex Core

Mirror of Apache Apex core

Stars: ✭ 346 (-57.65%)

Mutual labels: big-data

Oozie

Mirror of Apache Oozie

Stars: ✭ 602 (-26.32%)

Mutual labels: big-data

Grouparoo

🦘 The Grouparoo Monorepo - open source customer data sync framework

Stars: ✭ 334 (-59.12%)

Mutual labels: big-data

Hazelcast

Open-source distributed computation and storage platform

Stars: ✭ 4,662 (+470.62%)

Mutual labels: big-data

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+2598.65%)

Mutual labels: big-data

Uproot3

ROOT I/O in pure Python and NumPy.

Stars: ✭ 312 (-61.81%)

Mutual labels: big-data

Scanner

Efficient video analysis at scale

Stars: ✭ 569 (-30.35%)

Mutual labels: big-data

Cortx

CORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.

Stars: ✭ 426 (-47.86%)

Mutual labels: big-data

Samza

Mirror of Apache Samza

Stars: ✭ 676 (-17.26%)

Mutual labels: big-data

Datascience Ai Machinelearning Resources

Alex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.

Stars: ✭ 414 (-49.33%)

Mutual labels: big-data

Nipype

Workflows and interfaces for neuroimaging packages

Stars: ✭ 557 (-31.82%)

Mutual labels: big-data

Cogcomp Nlp

CogComp's Natural Language Processing libraries and Demos:

Stars: ✭ 410 (-49.82%)

Mutual labels: big-data

Storm

Mirror of Apache Storm

Stars: ✭ 6,297 (+670.75%)

Mutual labels: big-data

Decentralized Internet

A SDK/library for decentralized web and distributing computing projects

Stars: ✭ 406 (-50.31%)

Mutual labels: big-data

Thrill

Thrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++

Stars: ✭ 528 (-35.37%)

Mutual labels: big-data

Orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Stars: ✭ 389 (-52.39%)

Mutual labels: big-data

Sdc

Intel® Scalable Dataframe Compiler for Pandas*

Stars: ✭ 623 (-23.75%)

Mutual labels: big-data

Bigdl

Building Large-Scale AI Applications for Distributed Big Data

Stars: ✭ 3,813 (+366.71%)

Mutual labels: big-data

Beam

Apache Beam is a unified programming model for Batch and Streaming

Stars: ✭ 5,149 (+530.23%)

Mutual labels: big-data

Halodb

A fast, log structured key-value store.

Stars: ✭ 370 (-54.71%)

Mutual labels: big-data

Titanoboa

Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.

Stars: ✭ 787 (-3.67%)

Mutual labels: big-data

Sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.

Stars: ✭ 362 (-55.69%)

Mutual labels: big-data

Magellan

Geo Spatial Data Analytics on Spark

Stars: ✭ 507 (-37.94%)

Mutual labels: big-data

Bigtop

Mirror of Apache Bigtop

Stars: ✭ 356 (-56.43%)

Mutual labels: big-data

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+592.29%)

Mutual labels: big-data

Devops Roadmap

DevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.

Stars: ✭ 349 (-57.28%)

Mutual labels: big-data

Stream Framework

Stream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:

Stars: ✭ 4,576 (+460.1%)

Mutual labels: big-data

Stroom

Stroom is a highly scalable data storage, processing and analysis platform.

Stars: ✭ 344 (-57.89%)

Mutual labels: big-data

Cython

The most widely used Python to C compiler

Stars: ✭ 6,588 (+706.36%)

Mutual labels: big-data

Ozone

Scalable, redundant, and distributed object store for Apache Hadoop