OpenubaA robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-37.44%)
Tf YarnTrain TensorFlow models on YARN in just a few lines of code!
Stars: ✭ 76 (-62.56%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-45.81%)
Job Model蚂蚁金服 - 国际事业群 - 前端 招聘
Stars: ✭ 110 (-45.81%)
LiftThe LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness in large scale machine learning workflows.
Stars: ✭ 127 (-37.44%)
Ds CheatsheetsList of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+4556.16%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-25.62%)
VolcanoA Cloud Native Batch System (Project under CNCF)
Stars: ✭ 2,114 (+941.38%)
Books技术书籍等
Stars: ✭ 110 (-45.81%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (-1.48%)
Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-63.05%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-30.54%)
LabsResearch on distributed system
Stars: ✭ 73 (-64.04%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (+803.94%)
Algorithms📝 算法导论与JavaScript实现
Stars: ✭ 126 (-37.93%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+926.6%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-66.01%)
Spark Bigquery ConnectorBigQuery data source for Apache Spark: Read data from BigQuery into DataFrames, write DataFrames into BigQuery tables.
Stars: ✭ 126 (-37.93%)
AtsdAxibase Time Series Database Documentation
Stars: ✭ 68 (-66.5%)
AztkAZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Stars: ✭ 152 (-25.12%)
Fast MrmrAn improved implementation of the classical feature selection method: minimum Redundancy and Maximum Relevance (mRMR).
Stars: ✭ 67 (-67%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-38.42%)
RegistrySchema Registry
Stars: ✭ 184 (-9.36%)
Flinkstreamsql基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
Stars: ✭ 1,682 (+728.57%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-67.98%)
Android Interview QuestionsA repository containing interview questions on DS, Java & Android based on my experiences.
Stars: ✭ 125 (-38.42%)
AthenacliAthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.
Stars: ✭ 151 (-25.62%)
JumbuneJumbune, an open source BigData APM & Data Quality Management Platform for Data Clouds. Enterprise feature offering is available at http://jumbune.com. More details of open source offering are at,
Stars: ✭ 64 (-68.47%)
Low Level Design PrimerDedicated Resources for the Low-Level System Design. Learn how to design and implement large-scale systems. Prep for the system design interview.
Stars: ✭ 2,706 (+1233%)
Leetcode In Swift My solutions to LeetCode problems written in Swift
Stars: ✭ 150 (-26.11%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-39.9%)
Fed Note我是Mokou, 📘 这里是写前端博客和备忘学习的地方。Vue3 源码解析连载中。喜欢请Star。
Stars: ✭ 180 (-11.33%)
Whylogs JavaProfile and monitor your ML data pipeline end-to-end
Stars: ✭ 164 (-19.21%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (-30.05%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-46.31%)
Interviewguide《大厂面试指北》——包括Java基础、JVM、数据库、mysql、redis、计算机网络、算法、数据结构、操作系统、设计模式、系统设计、框架原理。最佳阅读地址:http://notfound9.github.io/interviewGuide/
Stars: ✭ 3,117 (+1435.47%)
Zemberek Nlp ServerZemberek Türkçe NLP Java Kütüphanesi üzerine REST Docker Sunucu
Stars: ✭ 60 (-70.44%)
DeequDeequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Stars: ✭ 2,020 (+895.07%)
Pyspark ExamplesCode examples on Apache Spark using python
Stars: ✭ 58 (-71.43%)
AvroApache Avro is a data serialization system.
Stars: ✭ 2,005 (+887.68%)
Awesome PulsarA curated list of Pulsar tools, integrations and resources.
Stars: ✭ 57 (-71.92%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (-46.31%)
Daudit🌲 Configuration flaws detector for Hadoop, MongoDB, MySQL, and more!
Stars: ✭ 108 (-46.8%)
InterviewsA list of fancy questions I've been asked during the interviews I had. Some of them I ask when interviewing people.
Stars: ✭ 140 (-31.03%)
Haproxy Configs80+ HAProxy Configs for Hadoop, Big Data, NoSQL, Docker, Elasticsearch, SolrCloud, HBase, MySQL, PostgreSQL, Apache Drill, Hive, Presto, Impala, Hue, ZooKeeper, SSH, RabbitMQ, Redis, Riak, Cloudera, OpenTSDB, InfluxDB, Prometheus, Kibana, Graphite, Rancher etc.
Stars: ✭ 106 (-47.78%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-46.8%)