Vagrant ProjectsVagrant projects for various use-cases with Spark, Zeppelin, IPython / Jupyter, SparkR
Stars: ✭ 34 (+88.89%)
columnifyMake record oriented data to columnar format.
Stars: ✭ 28 (+55.56%)
Spark FlamegraphEasy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (+66.67%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (+750%)
HeraclesHigh performance HBase / Spark SQL engine
Stars: ✭ 27 (+50%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (+738.89%)
TedsdsApache Spark - Turbofan Engine Degradation Simulation Data Set example in Apache Spark
Stars: ✭ 14 (-22.22%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+10094.44%)
avro-schema-generatorLibrary for generating avro schema files (.avsc) based on DB tables structure
Stars: ✭ 38 (+111.11%)
MlfeatureFeature engineering toolkit for Spark MLlib.
Stars: ✭ 12 (-33.33%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+733.33%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-38.89%)
avrowAvrow is a pure Rust implementation of the avro specification https://avro.apache.org/docs/current/spec.html with Serde support.
Stars: ✭ 27 (+50%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+4605.56%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+427.78%)
MleapMLeap: Deploy ML Pipelines to Production
Stars: ✭ 1,232 (+6744.44%)
ChroniclerScala toolchain for InfluxDB
Stars: ✭ 24 (+33.33%)
DigitrecognizerJava Convolutional Neural Network example for Hand Writing Digit Recognition
Stars: ✭ 23 (+27.78%)
parquet-extraA collection of Apache Parquet add-on modules
Stars: ✭ 30 (+66.67%)
Nd4jFast, Scientific and Numerical Computing for the JVM (NDArrays)
Stars: ✭ 1,742 (+9577.78%)
Sparkling WaterSparkling Water provides H2O functionality inside Spark cluster
Stars: ✭ 887 (+4827.78%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (+688.89%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+4438.89%)
ScannsA scalable nearest neighbor search library in Apache Spark
Stars: ✭ 190 (+955.56%)
LeharVisualize data using relative ordering
Stars: ✭ 81 (+350%)
Spark RedisA connector for Spark that allows reading and writing to/from Redis cluster
Stars: ✭ 773 (+4194.44%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (+677.78%)
AngelA Flexible and Powerful Parameter Server for large-scale machine learning
Stars: ✭ 6,458 (+35777.78%)
InsulatorA client UI to inspect Kafka topics, consume, produce and much more
Stars: ✭ 53 (+194.44%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+4038.89%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (+672.22%)
FramelessExpressive types for Spark.
Stars: ✭ 717 (+3883.33%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+10016.67%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (+83.33%)
Apache Spark NodeNode.js bindings for Apache Spark DataFrame APIs
Stars: ✭ 136 (+655.56%)
FreestyleA cohesive & pragmatic framework of FP centric Scala libraries
Stars: ✭ 627 (+3383.33%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+31322.22%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+30527.78%)
Spark GbtlrHybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
Stars: ✭ 81 (+350%)
avro-parser-haskellLanguage definition and parser for AVRO (.avdl) files.
Stars: ✭ 14 (-22.22%)
Js SparkRealtime calculation distributed system. AKA distributed lodash
Stars: ✭ 187 (+938.89%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+338.89%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (+333.33%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (+933.33%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+6561.11%)
Covid19TrackerA Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (+261.11%)