GiraphMirror of Apache Giraph
Stars: ✭ 569 (+749.25%)
OodtMirror of Apache OODT
Stars: ✭ 52 (-22.39%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+7817.91%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+47091.04%)
YmcacheYMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.
Stars: ✭ 58 (-13.43%)
Awesome Flink😎 A curated list of amazingly awesome Flink and Flink ecosystem resources
Stars: ✭ 530 (+691.04%)
PhoenixMirror of Apache Phoenix
Stars: ✭ 867 (+1194.03%)
ArkimeArkime (formerly Moloch) is an open source, large scale, full packet capturing, indexing, and database system.
Stars: ✭ 4,994 (+7353.73%)
TrckQuery engine for TrailDB
Stars: ✭ 48 (-28.36%)
Onlinestats.jlSingle-pass algorithms for statistics
Stars: ✭ 507 (+656.72%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-83.58%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (+644.78%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-7.46%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (+623.88%)
AccumuloApache Accumulo
Stars: ✭ 857 (+1179.1%)
YauaaYet Another UserAgent Analyzer
Stars: ✭ 472 (+604.48%)
TraildbTrailDB is an efficient tool for storing and querying series of events
Stars: ✭ 1,029 (+1435.82%)
HazelcastOpen-source distributed computation and storage platform
Stars: ✭ 4,662 (+6858.21%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+1174.63%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (+577.61%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+32807.46%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-61.19%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (+550.75%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+526.87%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-71.64%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (-5.97%)
Opendata.cern.chSource code for the CERN Open Data portal
Stars: ✭ 411 (+513.43%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-92.54%)
MockneatMockNeat is a Java 8+ library that facilitates the generation of arbitrary data for your applications.
Stars: ✭ 410 (+511.94%)
EgadsA Java package to automatically detect anomalies in large scale time-series data
Stars: ✭ 997 (+1388.06%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (+1119.4%)
IgniteApache Ignite
Stars: ✭ 4,027 (+5910.45%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-17.91%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+1074.63%)
HalodbA fast, log structured key-value store.
Stars: ✭ 370 (+452.24%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+9298.51%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-11.94%)
VespaThe open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+5492.54%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+9732.84%)
MetricsMeasure behavior of Java applications
Stars: ✭ 35 (-47.76%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (+908.96%)
Grouparoo🦘 The Grouparoo Monorepo - open source customer data sync framework
Stars: ✭ 334 (+398.51%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-19.4%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+8341.79%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (+829.85%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-2.99%)
Attic LensMirror of Apache Lens
Stars: ✭ 58 (-13.43%)
Macro mlCourse Website on Macroeconomic Analysis with Machine Learning and Big Data
Stars: ✭ 53 (-20.9%)
QcportalA client interface to the QCArchive Project (read-only image of QCFractal)
Stars: ✭ 29 (-56.72%)