🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes

Stars: ✭ 499 (+644.78%)

Mutual labels: big-data

Warp

Convert and analyze large data sets at light speed, on Mac and iOS.

Stars: ✭ 62 (-7.46%)

Mutual labels: big-data

Fit Sne

Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)

Stars: ✭ 485 (+623.88%)

Mutual labels: big-data

Accumulo

Apache Accumulo

Stars: ✭ 857 (+1179.1%)

Mutual labels: big-data

Yauaa

Yet Another UserAgent Analyzer

Stars: ✭ 472 (+604.48%)

Mutual labels: flink

Traildb

TrailDB is an efficient tool for storing and querying series of events

Stars: ✭ 1,029 (+1435.82%)

Mutual labels: big-data

Hazelcast

Open-source distributed computation and storage platform

Stars: ✭ 4,662 (+6858.21%)

Mutual labels: big-data

Dataflowjavasdk

Google Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.

Stars: ✭ 854 (+1174.63%)

Mutual labels: big-data

Courses

Quiz & Assignment of Coursera

Stars: ✭ 454 (+577.61%)

Mutual labels: big-data

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-14.93%)

Mutual labels: big-data

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+32807.46%)

Mutual labels: big-data

Pretzel

Javascript full-stack framework for Big Data visualisation and analysis

Stars: ✭ 26 (-61.19%)

Mutual labels: big-data

Circosjs

d3 library to build circular graphs

Stars: ✭ 436 (+550.75%)

Mutual labels: big-data

Couchdb Couch

Mirror of Apache CouchDB

Stars: ✭ 43 (-35.82%)

Mutual labels: big-data

Featran

A Scala feature transformation library for data science and machine learning

Stars: ✭ 420 (+526.87%)

Mutual labels: flink

Bandar Log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 19 (-71.64%)

Mutual labels: big-data

Flink Streaming Platform Web

基于flink-sql的实时流计算web平台

Stars: ✭ 416 (+520.9%)

Mutual labels: flink

Cloud Volume

Read and write Neuroglancer datasets programmatically.

Stars: ✭ 63 (-5.97%)

Mutual labels: big-data

Opendata.cern.ch

Source code for the CERN Open Data portal

Stars: ✭ 411 (+513.43%)

Mutual labels: big-data

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-92.54%)

Mutual labels: big-data

Mockneat

MockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.

Stars: ✭ 410 (+511.94%)

Mutual labels: big-data

Egads

A Java package to automatically detect anomalies in large scale time-series data

Stars: ✭ 997 (+1388.06%)

Mutual labels: big-data

Kafka Connect Hdfs

Kafka Connect HDFS connector

Stars: ✭ 400 (+497.01%)

Mutual labels: big-data

Sqoop

Mirror of Apache Sqoop

Stars: ✭ 817 (+1119.4%)

Mutual labels: big-data

Ignite

Apache Ignite

Stars: ✭ 4,027 (+5910.45%)

Mutual labels: big-data

Pulsar Spark

When Apache Pulsar meets Apache Spark

Stars: ✭ 55 (-17.91%)

Mutual labels: flink

Flink Ai Extended

Stars: ✭ 377 (+462.69%)

Mutual labels: flink

Titanoboa

Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.

Stars: ✭ 787 (+1074.63%)

Mutual labels: big-data

Halodb

A fast, log structured key-value store.

Stars: ✭ 370 (+452.24%)

Mutual labels: big-data

Analysispreservation.cern.ch

Source code for the CERN Analysis Preservation portal

Stars: ✭ 37 (-44.78%)

Mutual labels: big-data

Flink Training Course

Flink 中文视频课程（持续更新...）

Stars: ✭ 3,963 (+5814.93%)

Mutual labels: flink

Storm

Mirror of Apache Storm

Stars: ✭ 6,297 (+9298.51%)

Mutual labels: big-data

Verticapy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.

Stars: ✭ 59 (-11.94%)

Mutual labels: big-data

Vespa

The open big data serving engine. https://vespa.ai

Stars: ✭ 3,747 (+5492.54%)

Mutual labels: big-data

Cython

The most widely used Python to C compiler

Stars: ✭ 6,588 (+9732.84%)

Mutual labels: big-data

Attic Apex Core

Mirror of Apache Apex core

Stars: ✭ 346 (+416.42%)

Mutual labels: big-data

Metrics

Measure behavior of Java applications

Stars: ✭ 35 (-47.76%)

Mutual labels: big-data

Parquet Cpp

Apache Parquet

Stars: ✭ 339 (+405.97%)

Mutual labels: big-data

Samza

Mirror of Apache Samza

Stars: ✭ 676 (+908.96%)

Mutual labels: big-data

Grouparoo

🦘 The Grouparoo Monorepo - open source customer data sync framework

Stars: ✭ 334 (+398.51%)

Mutual labels: big-data

Lifion Kinesis

A native Node.js producer and consumer library for Amazon Kinesis Data Streams

Stars: ✭ 54 (-19.4%)

Mutual labels: big-data

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+8341.79%)

Mutual labels: big-data

Sdc

Intel® Scalable Dataframe Compiler for Pandas*

Stars: ✭ 623 (+829.85%)

Mutual labels: big-data

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-2.99%)

Mutual labels: big-data

Spark Doc Zh

Apache Spark 官方文档中文版