SeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.

Stars: ✭ 13,380 (+1515.94%)

Mutual labels: hdfs

datasqueeze

Hadoop utility to compact small files

Stars: ✭ 18 (-97.83%)

Mutual labels: hdfs

Elasticctr

ElasticCTR，即飞桨弹性计算推荐系统，是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务，帮助用户在Kubernetes环境中一键完成推荐系统部署，具备高性能、工业级部署、端到端体验的特点，并且作为开源套件，满足二次深度开发的需求。

Stars: ✭ 123 (-85.14%)

Mutual labels: hdfs

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (-50.97%)

Mutual labels: hdfs

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+1227.42%)

Mutual labels: hdfs

starlake

Starlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing

Stars: ✭ 16 (-98.07%)

Mutual labels: hdfs

Camus

Mirror of Linkedin's Camus

Stars: ✭ 81 (-90.22%)

Mutual labels: hdfs

fluent-plugin-webhdfs

Hadoop WebHDFS output plugin for Fluentd

Stars: ✭ 57 (-93.12%)

Mutual labels: hdfs

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-93%)

Mutual labels: hdfs

hive to es

同步Hive数据仓库数据到Elasticsearch的小工具

Stars: ✭ 21 (-97.46%)

Mutual labels: hdfs

DataX-src

DataX 是异构数据广泛使用的离线数据同步工具/平台，实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。

Stars: ✭ 21 (-97.46%)

Mutual labels: hdfs

Hdfs

A native go client for HDFS

Stars: ✭ 992 (+19.81%)

Mutual labels: hdfs

taller SparkR

Taller SparkR para las Jornadas de Usuarios de R

Stars: ✭ 12 (-98.55%)

Mutual labels: hdfs

kafka-connect-fs

Kafka Connect FileSystem Connector

Stars: ✭ 107 (-87.08%)

Mutual labels: hdfs

Juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Stars: ✭ 4,262 (+414.73%)

Mutual labels: hdfs

Storagetapper

StorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service

Stars: ✭ 232 (-71.98%)

Mutual labels: hdfs

fsbrowser

Fast desktop client for Hadoop Distributed File System

Stars: ✭ 27 (-96.74%)

Mutual labels: hdfs

Smart open

Utils for streaming large files (S3, HDFS, gzip, bz2...)

Stars: ✭ 2,306 (+178.5%)

Mutual labels: hdfs

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+625.6%)

Mutual labels: hdfs

Dcos Commons

DC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.

Stars: ✭ 162 (-80.43%)

Mutual labels: hdfs

aaocp

一个对用户行为日志进行分析的大数据项目

Stars: ✭ 53 (-93.6%)

Mutual labels: hdfs

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (-81.88%)

Mutual labels: hdfs

bigkube

Minikube for big data with Scala and Spark

Stars: ✭ 16 (-98.07%)

Mutual labels: hdfs

Hsuntzu

HDFS compress tar zip snappy gzip uncompress untar codec hadoop spark

Stars: ✭ 135 (-83.7%)

Mutual labels: hdfs

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-97.71%)

Mutual labels: hdfs

Dynamometer

A tool for scale and performance testing of HDFS with a specific focus on the NameNode.

Stars: ✭ 122 (-85.27%)

Mutual labels: hdfs

Sparta

Real Time Analytics and Data Pipelines based on Spark Streaming

Stars: ✭ 513 (-38.04%)

Mutual labels: hdfs

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (+96.86%)

Mutual labels: hdfs

hbase-meta-repair

Repair hbase metadata table from hdfs.

Stars: ✭ 36 (-95.65%)

Mutual labels: hdfs

Repository

个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。

Stars: ✭ 92 (-88.89%)

Mutual labels: hdfs

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-98.43%)

Mutual labels: hdfs

Bigdata File Viewer

A cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.

Stars: ✭ 86 (-89.61%)

Mutual labels: hdfs

ucz-dfs

A distributed file system written in Rust.

Stars: ✭ 25 (-96.98%)

Mutual labels: hdfs

Tiledb Py

Python interface to the TileDB storage manager

Stars: ✭ 78 (-90.58%)

Mutual labels: hdfs

Kafka Connect Hdfs

Kafka Connect HDFS connector