Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Stars: ✭ 115 (-20.14%)

Mutual labels: parquet

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+495.14%)

Mutual labels: hadoop

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (-40.28%)

Mutual labels: parquet

Hadoop Pot

A scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.

Stars: ✭ 8 (-94.44%)

Mutual labels: hadoop

Hbaseclient

HBase客户端数据管理软件

Stars: ✭ 135 (-6.25%)

Mutual labels: hadoop

Kylo

Kylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.

Stars: ✭ 916 (+536.11%)

Mutual labels: hadoop

Sparksql Protobuf

Read SparkSQL parquet file as RDD[Protobuf]

Stars: ✭ 82 (-43.06%)

Mutual labels: parquet

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-88.89%)

Mutual labels: parquet

Tensorflowonyarn

Support TensorFlow on YARN

Stars: ✭ 114 (-20.83%)

Mutual labels: hadoop

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-96.53%)

Mutual labels: hadoop

Camus

Mirror of Linkedin's Camus

Stars: ✭ 81 (-43.75%)

Mutual labels: hadoop

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (-11.11%)

Mutual labels: hadoop

Oap

Optimized Analytics Package for Spark* Platform

Stars: ✭ 343 (+138.19%)

Mutual labels: parquet

Useractionanalyzeplatform

电商用户行为分析大数据平台

Stars: ✭ 645 (+347.92%)

Mutual labels: hadoop

Docker Spark

🚢 Docker image for Apache Spark

Stars: ✭ 78 (-45.83%)

Mutual labels: hadoop

Javapdf

🍣100本 Java电子书技术书籍PDF(以下载阅读为荣，以点赞收藏为耻)

Stars: ✭ 609 (+322.92%)

Mutual labels: hadoop

Xlearning Xdml

extremely distributed machine learning

Stars: ✭ 113 (-21.53%)

Mutual labels: hadoop

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+325.69%)

Mutual labels: hadoop

Tf Yarn

Train TensorFlow models on YARN in just a few lines of code!

Stars: ✭ 76 (-47.22%)

Mutual labels: hadoop

Hadoop study

定期更新Hadoop生态圈中常用大数据组件文档重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图印象笔记 Scala版本简单demo 常用工具类去敏后的train code 持续更新!!!)

Stars: ✭ 567 (+293.75%)

Mutual labels: hadoop

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-59.72%)

Mutual labels: parquet

Ytk Learn

Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).

Stars: ✭ 337 (+134.03%)

Mutual labels: hadoop

Gis Tools For Hadoop

The GIS Tools for Hadoop are a collection of GIS tools for spatial analysis of big data.

Stars: ✭ 485 (+236.81%)

Mutual labels: hadoop

Docker Hadoop

Apache Hadoop docker image

Stars: ✭ 1,190 (+726.39%)

Mutual labels: hadoop

Pdf

编程电子书，电子书，编程书籍，包括C，C#，Docker，Elasticsearch，Git，Hadoop，HeadFirst，Java，Javascript，jvm，Kafka，Linux，Maven，MongoDB，MyBatis，MySQL，Netty，Nginx，Python，RabbitMQ，Redis，Scala，Solr，Spark，Spring，SpringBoot，SpringCloud，TCPIP，Tomcat，Zookeeper，人工智能，大数据类，并发编程，数据库类，数据挖掘，新面试题，架构设计，算法系列，计算机类，设计模式，软件测试，重构优化，等更多分类

Stars: ✭ 12,009 (+8239.58%)

Mutual labels: hadoop

Introtohadoopandmr udacity course

🐘 Source code for assignments of Udacity course "Introduction to Hadoop and MapReduce"

Stars: ✭ 110 (-23.61%)

Mutual labels: hadoop

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+4072.22%)

Mutual labels: hadoop

Hive Funnel Udf

Hive UDFs for funnel analysis

Stars: ✭ 72 (-50%)

Mutual labels: hadoop

Gather Deployment

Gathers scalable tensorflow and infrastructure deployment

Stars: ✭ 326 (+126.39%)

Mutual labels: hadoop

Src

A light-weight distributed stream computing framework for Golang

Stars: ✭ 67 (-53.47%)

Mutual labels: hadoop

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+7532.64%)

Mutual labels: hadoop

Docker Spark Cluster

A Spark cluster setup running on Docker containers

Stars: ✭ 57 (-60.42%)

Mutual labels: hadoop

Cascading

Cascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.

Stars: ✭ 318 (+120.83%)

Mutual labels: hadoop

Orc

Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Stars: ✭ 389 (+170.14%)

Mutual labels: hadoop

Parquet Index

Spark SQL index for Parquet tables

Stars: ✭ 109 (-24.31%)

Mutual labels: parquet

Ignite

Apache Ignite

Stars: ✭ 4,027 (+2696.53%)

Mutual labels: hadoop

Waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

Stars: ✭ 60 (-58.33%)

Mutual labels: hadoop

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (+158.33%)

Mutual labels: parquet

Calcite Avatica

Mirror of Apache Calcite - Avatica

Stars: ✭ 130 (-9.72%)

Mutual labels: hadoop

Wedatasphere

WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!

Stars: ✭ 372 (+158.33%)

Mutual labels: hadoop

Likelike

An implementation of locality sensitive hashing with Hadoop

Stars: ✭ 58 (-59.72%)

Mutual labels: hadoop

Parquet Cpp

Apache Parquet

Stars: ✭ 339 (+135.42%)

Mutual labels: parquet

Bigdata Notebook

Stars: ✭ 100 (-30.56%)