experimentsCode examples for my blog posts
Stars: ✭ 21 (-80%)
FreestyleA cohesive & pragmatic framework of FP centric Scala libraries
Stars: ✭ 627 (+497.14%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+72.38%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-24.76%)
visualize-data-with-pythonA Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (-42.86%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+5286.67%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+207.62%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (+5150.48%)
ETL-Starter-Kit📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
Stars: ✭ 21 (-80%)
bqvThe simplest tool to manage views of BigQuery.
Stars: ✭ 22 (-79.05%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+5022.86%)
vulknLove your Data. Love the Environment. Love VULKИ.
Stars: ✭ 43 (-59.05%)
SparklearningLearning Apache spark,including code and data .Most part can run local.
Stars: ✭ 558 (+431.43%)
UnROOT.jlNative Julia I/O package to work with CERN ROOT files
Stars: ✭ 52 (-50.48%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+1041.9%)
cdsData syncing in golang for ClickHouse.
Stars: ✭ 839 (+699.05%)
JustenoughscalaforsparkA tutorial on the most important features and idioms of Scala that you need to use Spark's Scala APIs.
Stars: ✭ 538 (+412.38%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (-42.86%)
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (-46.67%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+388.57%)
learning-sparkTidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-73.33%)
Biglassobiglasso: Extending Lasso Model Fitting to Big Data in R
Stars: ✭ 87 (-17.14%)
columnifyMake record oriented data to columnar format.
Stars: ✭ 28 (-73.33%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+382.86%)
NotesThis is a learning note | Java基础,JVM,源码,大数据,面经
Stars: ✭ 69 (-34.29%)
Delta ArchitectureStreaming data changes to a Data Lake with Debezium and Delta Lake pipeline
Stars: ✭ 43 (-59.05%)
PointblankData validation and organization of metadata for data frames and database tables
Stars: ✭ 480 (+357.14%)
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-72.38%)
Spark StatesCustom state store providers for Apache Spark
Stars: ✭ 83 (-20.95%)
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (-38.1%)
SparkCross-platform real-time collaboration client optimized for business and organizations.
Stars: ✭ 471 (+348.57%)
jhdfA pure Java HDF5 library
Stars: ✭ 83 (-20.95%)
GatkOfficial code repository for GATK versions 4 and up
Stars: ✭ 1,002 (+854.29%)
dt-sql-parserSQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (+28.57%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+334.29%)
greycatGreyCat - Data Analytics, Temporal data, What-if, Live machine learning
Stars: ✭ 104 (-0.95%)
lectures-hse-sparkМасштабируемое машинное обучение и анализ больших данных с Apache Spark
Stars: ✭ 20 (-80.95%)
TensorbaseTensorBase BE is building a high performance, cloud neutral bigdata warehouse for SMEs fully in Rust.
Stars: ✭ 440 (+319.05%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+850.48%)
PersonNotes个人笔记集中营,快糙猛的形式记录技术性Notes .. 📚☕️⌨️🎧
Stars: ✭ 61 (-41.9%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+1038.1%)
DigitrecognizerJava Convolutional Neural Network example for Hand Writing Digit Recognition
Stars: ✭ 23 (-78.1%)
DetEditA graphical user interface for annotating and editing events detected in long-term acoustic monitoring data
Stars: ✭ 20 (-80.95%)
jigsaw-seed这是组件库 Jigsaw-七巧板(https://github.com/rdkmaster/jigsaw) 的种子工程,建议所有新增的app都以这个工程作为种子开始构建。
Stars: ✭ 17 (-83.81%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+772.38%)
Book本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-55.24%)
10 Weeks10-weeks of technology exploration
Stars: ✭ 22 (-79.05%)