TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+1762.2%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-61.38%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-39.02%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-60.57%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+1486.59%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+2056.5%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+2199.19%)
Dremio OssDremio - the missing link in modern data
Stars: ✭ 862 (+250.41%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-94.31%)
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-82.52%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+67.89%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+108.54%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+8862.6%)
Rakam Api📈 Collect customer event data from your apps. (Note that this project only includes the API collector, not the visualization platform)
Stars: ✭ 772 (+213.82%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-95.53%)
SnappydataProject SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Stars: ✭ 995 (+304.47%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (+156.1%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-67.89%)
Big Data🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-62.2%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+443.9%)
Cube.js📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+4771.14%)
OpenubaA robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-48.37%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+567.48%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+8089.84%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+1450%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (+70.33%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+47.56%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+106.1%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (+102.85%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+2141.06%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (+46.75%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-12.2%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+202.85%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-12.6%)
QcportalA client interface to the QCArchive Project (read-only image of QCFractal)
Stars: ✭ 29 (-88.21%)
SparkApache Spark - A unified analytics engine for large-scale data processing
Stars: ✭ 31,618 (+12752.85%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+47.15%)
LabsResearch on distributed system
Stars: ✭ 73 (-70.33%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-55.28%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-58.94%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+947.15%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-71.95%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-43.5%)
OpaqueAn encrypted data analytics platform
Stars: ✭ 129 (-47.56%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-37.8%)
FiliEasily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.
Stars: ✭ 151 (-38.62%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-38.21%)
Data Science Live BookAn open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (-21.54%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-73.58%)