Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-85.87%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (-84.54%)
docker-hadoopDocker image for main Apache Hadoop components (Yarn/Hdfs)
Stars: ✭ 59 (-92.87%)
Cloud Note基于分布式的云笔记(参考某道云笔记),数据存储在redis与hbase中
Stars: ✭ 71 (-91.43%)
Wradlibweather radar data processing - python package
Stars: ✭ 143 (-82.73%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-98.31%)
Wifi基于wifi抓取信息的大数据查询分析系统
Stars: ✭ 93 (-88.77%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (-94.2%)
TiledbThe Universal Storage Engine
Stars: ✭ 1,072 (+29.47%)
ros hadoopHadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Stars: ✭ 92 (-88.89%)
HdfsAPI and command line interface for HDFS
Stars: ✭ 209 (-74.76%)
SeaweedfsSeaweedFS is a fast distributed storage system for blobs, objects, files, and data lake, for billions of files! Blob store has O(1) disk seek, cloud tiering. Filer supports Cloud Drive, cross-DC active-active replication, Kubernetes, POSIX FUSE mount, S3 API, S3 Gateway, Hadoop, WebDAV, encryption, Erasure Coding.
Stars: ✭ 13,380 (+1515.94%)
datasqueezeHadoop utility to compact small files
Stars: ✭ 18 (-97.83%)
ElasticctrElasticCTR,即飞桨弹性计算推荐系统,是基于Kubernetes的企业级推荐系统开源解决方案。该方案融合了百度业务场景下持续打磨的高精度CTR模型、飞桨开源框架的大规模分布式训练能力、工业级稀疏参数弹性调度服务,帮助用户在Kubernetes环境中一键完成推荐系统部署,具备高性能、工业级部署、端到端体验的特点,并且作为开源套件,满足二次深度开发的需求。
Stars: ✭ 123 (-85.14%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-50.97%)
starlakeStarlake is a Spark Based On Premise and Cloud ELT/ETL Framework for Batch & Stream Processing
Stars: ✭ 16 (-98.07%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (-90.22%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-93%)
hive to es同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (-97.46%)
DataX-srcDataX 是异构数据广泛使用的离线数据同步工具/平台,实现包括 MySQL、Oracle、SqlServer、Postgre、HDFS、Hive、ADS、HBase、OTS、ODPS 等各种异构数据源之间高效的数据同步功能。
Stars: ✭ 21 (-97.46%)
HdfsA native go client for HDFS
Stars: ✭ 992 (+19.81%)
taller SparkRTaller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (-98.55%)
JuicefsJuiceFS is a distributed POSIX file system built on top of Redis and S3.
Stars: ✭ 4,262 (+414.73%)
StoragetapperStorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (-71.98%)
fsbrowserFast desktop client for Hadoop Distributed File System
Stars: ✭ 27 (-96.74%)
Smart openUtils for streaming large files (S3, HDFS, gzip, bz2...)
Stars: ✭ 2,306 (+178.5%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+625.6%)
Dcos CommonsDC/OS SDK is a collection of tools, libraries, and documentation for easy integration of technologies such as Kafka, Cassandra, HDFS, Spark, and TensorFlow with DC/OS.
Stars: ✭ 162 (-80.43%)
aaocp一个对用户行为日志进行分析的大数据项目
Stars: ✭ 53 (-93.6%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-81.88%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-98.07%)
HsuntzuHDFS compress tar zip snappy gzip uncompress untar codec hadoop spark
Stars: ✭ 135 (-83.7%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-97.71%)
DynamometerA tool for scale and performance testing of HDFS with a specific focus on the NameNode.
Stars: ✭ 122 (-85.27%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (-38.04%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+96.86%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-88.89%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-98.43%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-89.61%)
ucz-dfsA distributed file system written in Rust.
Stars: ✭ 25 (-96.98%)
Tiledb PyPython interface to the TileDB storage manager
Stars: ✭ 78 (-90.58%)
local-hashicorp-stackLocal Hashicorp Stack for DevOps Development without Hypervisor or Cloud
Stars: ✭ 23 (-97.22%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-97.58%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-99.4%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (-41.06%)
py-hdfs-mountMount HDFS with fuse, works with kerberos!
Stars: ✭ 13 (-98.43%)