Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+8527.27%)
Nd4jFast, Scientific and Numerical Computing for the JVM (NDArrays)
Stars: ✭ 1,742 (+15736.36%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+8345.45%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+24154.55%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+4045.45%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+7690.91%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+736.36%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+200336.36%)
CdapAn open source framework for building data analytic applications.
Stars: ✭ 509 (+4527.27%)
AngelA Flexible and Powerful Parameter Server for large-scale machine learning
Stars: ✭ 6,458 (+58609.09%)
Mathextmathext implements basic elementary functions not included in the Go standard library [DEPRECATED]
Stars: ✭ 18 (+63.64%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+6672.73%)
Future🚀 R package: future: Unified Parallel and Distributed Processing in R for Everyone
Stars: ✭ 735 (+6581.82%)
Spark SwaggerSpark (http://sparkjava.com/) support for Swagger (https://swagger.io/)
Stars: ✭ 25 (+127.27%)
GushFast and distributed workflow runner using ActiveJob and Redis
Stars: ✭ 894 (+8027.27%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+6518.18%)
CasadiCasADi is a symbolic framework for numeric optimization implementing automatic differentiation in forward and reverse modes on sparse matrix-valued computational graphs. It supports self-contained C-code generation and interfaces state-of-the-art codes such as SUNDIALS, IPOPT etc. It can be used from C++, Python or Matlab/Octave.
Stars: ✭ 714 (+6390.91%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+6227.27%)
PbmcapplyTracking the progress of mc*apply with progress bar.
Stars: ✭ 25 (+127.27%)
ChroniclerScala toolchain for InfluxDB
Stars: ✭ 24 (+118.18%)
Sparkling WaterSparkling Water provides H2O functionality inside Spark cluster
Stars: ✭ 887 (+7963.64%)
MfemLightweight, general, scalable C++ library for finite element methods
Stars: ✭ 667 (+5963.64%)
SparklyrR interface for Apache Spark
Stars: ✭ 775 (+6945.45%)
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+6718.18%)
SparkctrCTR prediction model based on spark(LR, GBDT, DNN)
Stars: ✭ 740 (+6627.27%)
EdgeExtreme-scale Discontinuous Galerkin Environment (EDGE)
Stars: ✭ 18 (+63.64%)
Cdhprojecthadoop各组件使用,持续更新
Stars: ✭ 733 (+6563.64%)
Dockerfiles50+ DockerHub public images for Docker & Kubernetes - Hadoop, Kafka, ZooKeeper, HBase, Cassandra, Solr, SolrCloud, Presto, Apache Drill, Nifi, Spark, Consul, Riak, TeamCity and DevOps tools built on the major Linux distros: Alpine, CentOS, Debian, Fedora, Ubuntu
Stars: ✭ 847 (+7600%)
FramelessExpressive types for Spark.
Stars: ✭ 717 (+6418.18%)
HailScalable genomic data analysis.
Stars: ✭ 706 (+6318.18%)
ReflowA language and runtime for distributed, incremental data processing in the cloud
Stars: ✭ 706 (+6318.18%)
Rupturesruptures: change point detection in Python
Stars: ✭ 654 (+5845.45%)
Ocaml OdepackBinding to the ODEPACK FORTRAN library
Stars: ✭ 6 (-45.45%)
Corral🐎 A serverless MapReduce framework written for AWS Lambda
Stars: ✭ 648 (+5790.91%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+5654.55%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+7327.27%)
FreestyleA cohesive & pragmatic framework of FP centric Scala libraries
Stars: ✭ 627 (+5600%)
VexclVexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
Stars: ✭ 626 (+5590.91%)
CoreThe core source repository for the Cherab project.
Stars: ✭ 26 (+136.36%)
DigitrecognizerJava Convolutional Neural Network example for Hand Writing Digit Recognition
Stars: ✭ 23 (+109.09%)
LinfaA Rust machine learning framework.
Stars: ✭ 812 (+7281.82%)
Dev SetupmacOS development environment setup: Easy-to-understand instructions with automated setup scripts for developer tools like Vim, Sublime Text, Bash, iTerm, Python data analysis, Spark, Hadoop MapReduce, AWS, Heroku, JavaScript web development, Android development, common data stores, and dev-based OS X defaults.
Stars: ✭ 5,590 (+50718.18%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+51318.18%)
ItkInsight Toolkit (ITK) -- Official Repository. ITK builds on a proven, spatially-oriented architecture for processing, segmentation, and registration of scientific images in two, three, or more dimensions.
Stars: ✭ 801 (+7181.82%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+5454.55%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+50018.18%)