Sparkling WaterSparkling Water provides H2O functionality inside Spark cluster
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Spark RedisA connector for Spark that allows reading and writing to/from Redis cluster
AngelA Flexible and Powerful Parameter Server for large-scale machine learning
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
SparkctrCTR prediction model based on spark(LR, GBDT, DNN)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
HailScalable genomic data analysis.
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
FreestyleA cohesive & pragmatic framework of FP centric Scala libraries
Dev SetupmacOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
DatafusionDataFusion has now been donated to the Apache Arrow project
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
SparklearningLearning Apache spark,including code and data .Most part can run local.
Spark DariaEssential Spark extensions and helper methods ✨😲
JustenoughscalaforsparkA tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
LopqTraining of Locally Optimized Product Quantization (LOPQ) models for approximate nearest neighbor search of high dimensional data in Python and Spark.
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
CdapAn open source framework for building data analytic applications.
MagellanGeo Spatial Data Analytics on Spark
PointblankData validation and organization of metadata for data frames and database tables
Pdf编程电子书,电子书,编程书籍,包括C,C#,Docker,Elasticsearch,Git,Hadoop,HeadFirst,Java,Javascript,jvm,Kafka,Linux,Maven,MongoDB,MyBatis,MySQL,Netty,Nginx,Python,RabbitMQ,Redis,Scala,Solr,Spark,Spring,SpringBoot,SpringCloud,TCPIP,Tomcat,Zookeeper,人工智能,大数据类,并发编程,数据库类,数据挖掘,新面试题,架构设计,算法系列,计算机类,设计模式,软件测试,重构优化,等更多分类
SparkCross-platform real-time collaboration client optimized for business and organizations.
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
YanagishimaWeb UI for Trino, Presto, Hive, Elasticsearch, SparkSQL
Dji Firmware ToolsTools for handling firmwares of DJI products, with focus on quadcopters.
MoonboxMoonbox is a DVtaaS (Data Virtualization as a Service) Platform
FeatranA Scala feature transformation library for data science and machine learning
Enterprise gatewayA lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Spark SolrTools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
IcebergIceberg is a table format for large, slow-moving tabular data
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Docker practiceLearn and understand Docker technologies, with real DevOps practice!
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
TensorflowonsparkTensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.