Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-15.1%)
K8s Ingress ClaimAn admission control policy that safeguards against accidental duplicate claiming of Hosts/Domains.
Stars: ✭ 14 (-99.11%)
Dremio OssDremio - the missing link in modern data
Stars: ✭ 862 (-45.3%)
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-93.65%)
AccumuloApache Accumulo
Stars: ✭ 857 (-45.62%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-95.62%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-45.81%)
ReefMirror of Apache REEF
Stars: ✭ 92 (-94.16%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-98.35%)
Bandar LogMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 19 (-98.79%)
SqoopMirror of Apache Sqoop
Stars: ✭ 817 (-48.16%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-95.88%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (-50.06%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-94.23%)
StormMirror of Apache Storm
Stars: ✭ 6,297 (+299.56%)
CythonThe most widely used Python to C compiler
Stars: ✭ 6,588 (+318.02%)
SamzaMirror of Apache Samza
Stars: ✭ 676 (-57.11%)
NabhashAn extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data
Stars: ✭ 62 (-96.07%)
SdcIntel® Scalable Dataframe Compiler for Pandas*
Stars: ✭ 623 (-60.47%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+258.88%)
Attic LensMirror of Apache Lens
Stars: ✭ 58 (-96.32%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+249.81%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-93.02%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (-63.9%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-64.66%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-94.92%)
ThrillThrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (-66.5%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-96.57%)
BeamApache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+226.71%)
KuduMirror of Apache Kudu
Stars: ✭ 1,360 (-13.71%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-67.83%)
OodtMirror of Apache OODT
Stars: ✭ 52 (-96.7%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+190.36%)
IotdbApache IoTDB
Stars: ✭ 1,221 (-22.53%)
RedisliteRedis in a python module.
Stars: ✭ 464 (-70.56%)
TrckQuery engine for TrailDB
Stars: ✭ 48 (-96.95%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (-71.19%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+1298.98%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (-34.96%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (-72.97%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-73.73%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-97.4%)
Cogcomp NlpCogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (-73.98%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-93.85%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (-2.03%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-93.08%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-93.59%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (-93.91%)
LabsResearch on distributed system
Stars: ✭ 73 (-95.37%)