Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-66.9%)
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-89.29%)
CodesearchnetDatasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+228.1%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+336.9%)
Awesome Ai Ml DlAwesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (+97.86%)
DatafusionDataFusion has now been donated to the Apache Arrow project
Stars: ✭ 611 (+45.48%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+94.52%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-1.67%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+396.19%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-61.19%)
AlinkAlink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.
Stars: ✭ 2,936 (+599.05%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+8.57%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (+32.86%)
AudioowlFast and simple music and audio analysis using RNN in Python 🕵️♀️ 🥁
Stars: ✭ 151 (-64.05%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+2609.05%)
QuicksqlA Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+333.57%)
WaterdropProduction Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+341.9%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+590.24%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-57.38%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-95.24%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+184.52%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-38.1%)
neptune-client📒 Experiment tracking tool and model registry
Stars: ✭ 348 (-17.14%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-70%)
Automl alexState-of-the art Automated Machine Learning python library for Tabular Data
Stars: ✭ 132 (-68.57%)
Hyperparameter hunterEasy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (+54.29%)
PycmMulti-class confusion matrix library in Python
Stars: ✭ 1,076 (+156.19%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+513.33%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+1330.48%)
AthenaxSQL-based streaming analytics platform at scale
Stars: ✭ 1,178 (+180.48%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+1212.62%)
ScioA Scala API for Apache Beam and Google Cloud Dataflow.
Stars: ✭ 2,247 (+435%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+104.05%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-86.9%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+185.48%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-78.1%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-80%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-74.29%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-83.57%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-65%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-77.38%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-33.81%)
Dataframe JsA javascript library providing a new data structure for datascientists and developpers
Stars: ✭ 376 (-10.48%)
KeypathkitKeyPathKit is a library that provides the standard functions to manipulate data along with a call-syntax that relies on typed keypaths to make the call sites as short and clean as possible.
Stars: ✭ 376 (-10.48%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (-1.43%)
React SpreadsheetSimple, customizable yet performant spreadsheet for React
Stars: ✭ 393 (-6.43%)
TensorflowonsparkTensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+792.38%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (-6.43%)
MezaA Python toolkit for processing tabular data
Stars: ✭ 374 (-10.95%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-11.43%)