SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-99.07%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-99.58%)
TraildbTrailDB is an efficient tool for storing and querying series of events
Stars: ✭ 1,029 (-12.65%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+2584.04%)
Macro mlCourse Website on Macroeconomic Analysis with Machine Learning and Big Data
Stars: ✭ 53 (-95.5%)
AutodlAutomated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (-27.5%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-94.74%)
Rakam Api📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Stars: ✭ 772 (-34.47%)
EgadsA Java package to automatically detect anomalies in large scale time-series data
Stars: ✭ 997 (-15.37%)
QcportalA client interface to the QCArchive Project (read-only image of QCFractal)
Stars: ✭ 29 (-97.54%)
Kafka Streamsequivalent to kafka-streams 🐙 for nodejs ✨🐢🚀✨
Stars: ✭ 613 (-47.96%)
Kibble 1Apache Kibble - a tool to collect, aggregate and visualize data about any software project
Stars: ✭ 54 (-95.42%)
PhoenixMirror of Apache Phoenix
Stars: ✭ 867 (-26.4%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (-94.65%)
Hazelcast JetDistributed Stream and Batch Processing
Stars: ✭ 855 (-27.42%)
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (-9.76%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-97.96%)
CarbondataMirror of Apache CarbonData
Stars: ✭ 1,158 (-1.7%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-36.76%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-94.99%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-46.52%)
Esper TvEsper instance for TV news analysis
Stars: ✭ 37 (-96.86%)
SkymapHigh-throughput gene to knowledge mapping through massive integration of public sequencing data.
Stars: ✭ 29 (-97.54%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+380.14%)
Awesome ScalabilityThe Patterns of Scalable, Reliable, and Performant Large-Scale Systems
Stars: ✭ 36,688 (+3014.43%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-94.48%)
K8s Ingress ClaimAn admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.
Stars: ✭ 14 (-98.81%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-95.42%)
Dremio OssDremio - the missing link in modern data
Stars: ✭ 862 (-26.83%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-94.14%)
AccumuloApache Accumulo
Stars: ✭ 857 (-27.25%)
OodtMirror of Apache OODT
Stars: ✭ 52 (-95.59%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-27.5%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-97.79%)
TrckQuery engine for TrailDB
Stars: ✭ 48 (-95.93%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-98.39%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (-30.65%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (-12.99%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (-33.19%)
NabhashAn extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data
Stars: ✭ 62 (-94.74%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+434.55%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-96.52%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+459.25%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (-42.61%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (-47.11%)
Attic LensMirror of Apache Lens
Stars: ✭ 58 (-95.08%)
MetricsMeasure behavior of Java applications
Stars: ✭ 35 (-97.03%)
AppdocsApplication Performance Optimization Summary
Stars: ✭ 1,169 (-0.76%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-94.31%)
YmcacheYMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.
Stars: ✭ 58 (-95.08%)