RecommendersBest Practices on Recommendation Systems
Stars: ✭ 11,818 (+4743.44%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+198.36%)
HailScalable genomic data analysis.
Stars: ✭ 706 (+189.34%)
AlmondA Scala kernel for Jupyter
Stars: ✭ 1,354 (+454.92%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+185.25%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+4873.77%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-60.25%)
FreestyleA cohesive & pragmatic framework of FP centric Scala libraries
Stars: ✭ 627 (+156.97%)
Azure Event Hubs☁️ Cloud-scale telemetry ingestion from any stream of data with Azure Event Hubs
Stars: ✭ 233 (-4.51%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+2218.03%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+2159.43%)
RsparseFast and accurate machine learning on sparse matrices - matrix factorizations, regression, classification, top-N recommendations.
Stars: ✭ 145 (-40.57%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+2104.51%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+448.36%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (+128.69%)
SparkFirely's open source FHIR server
Stars: ✭ 174 (-28.69%)
JustenoughscalaforsparkA tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Stars: ✭ 538 (+120.49%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-62.3%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-42.21%)
CdapAn open source framework for building data analytic applications.
Stars: ✭ 509 (+108.61%)
Big Data🔧 Use dplyr to analyze Big Data 🐘
Stars: ✭ 93 (-61.89%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+96.72%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (-15.98%)
SparkCross-platform real-time collaboration client optimized for business and organizations.
Stars: ✭ 471 (+93.03%)
Neu Review RecA Toolkit for Neural Review-based Recommendation models with Pytorch.
Stars: ✭ 92 (-62.3%)
Pytorch FmFactorization Machine models in PyTorch
Stars: ✭ 455 (+86.48%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-43.03%)
Bigdataie大数据博客、笔试题、教程、项目、面经的整理
Stars: ✭ 445 (+82.38%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+4931.56%)
YanagishimaWeb UI for Trino, Presto, Hive, Elasticsearch, SparkSQL
Stars: ✭ 424 (+73.77%)
Spark Nlp ModelsModels and Pipelines for the Spark NLP library
Stars: ✭ 88 (-63.93%)
MoonboxMoonbox is a DVtaaS (Data Virtualization as a Service) Platform
Stars: ✭ 424 (+73.77%)
RecotourA tour through recommendation algorithms in python [IN PROGRESS]
Stars: ✭ 140 (-42.62%)
LearningsparkScala examples for learning to use Spark
Stars: ✭ 421 (+72.54%)
DeepicfTensorFlow Implementation of Deep Item-based Collaborative Filtering Model for Top-N Recommendation
Stars: ✭ 86 (-64.75%)
SparkleHaskell on Apache Spark.
Stars: ✭ 419 (+71.72%)
Enterprise gatewayA lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across distributed clusters such as Apache Spark, Kubernetes and others.
Stars: ✭ 412 (+68.85%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-64.75%)
Spark SolrTools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.
Stars: ✭ 411 (+68.44%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-43.03%)
TutorialJava全栈知识架构体系总结
Stars: ✭ 407 (+66.8%)
GatkOfficial code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+310.66%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+61.07%)
MetarecPyTorch Implementations For A Series Of Deep Learning-Based Recommendation Models (IN PROGRESS)
Stars: ✭ 120 (-50.82%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+309.02%)
SparkmonitorMonitor Apache Spark from Jupyter Notebook
Stars: ✭ 154 (-36.89%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+568.03%)
GluonrankRanking made easy
Stars: ✭ 39 (-84.02%)
SnappydataProject SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
Stars: ✭ 995 (+307.79%)