Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+1300%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+100%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+1255.17%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+5520.69%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (+1082.76%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-51.72%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+5562.07%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-31.03%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+217.24%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (-27.59%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+2855.17%)
Parquet IndexSpark SQL index for Parquet tables
Stars: ✭ 109 (+275.86%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-44.83%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (+196.55%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (+234.48%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-34.48%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+417.24%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+1151.72%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-55.17%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+20617.24%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+1668.97%)
SnakebiteA pure python HDFS client
Stars: ✭ 828 (+2755.17%)
Impala Java ClientJava client to connect directly to Impala using thrift
Stars: ✭ 26 (-10.34%)
Hadoop For GeoeventArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.
Stars: ✭ 5 (-82.76%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+2717.24%)
Rpc proxy基于thrift的服务注册和发现框架
Stars: ✭ 13 (-55.17%)
Tiledb VcfEfficient variant-call data storage and retrieval library using the TileDB storage library.
Stars: ✭ 26 (-10.34%)
Zyshigh performance service framework based on Yaf or Swoole
Stars: ✭ 812 (+2700%)
Spark SwaggerSpark (http://sparkjava.com/) support for Swagger (https://swagger.io/)
Stars: ✭ 25 (-13.79%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+2634.48%)
Spark RedisA connector for Spark that allows reading and writing to/from Redis cluster
Stars: ✭ 773 (+2565.52%)
UrhoxUrho3D extension library
Stars: ✭ 13 (-55.17%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+3103.45%)
SparklyrR interface for Apache Spark
Stars: ✭ 775 (+2572.41%)
AngelA Flexible and Powerful Parameter Server for large-scale machine learning
Stars: ✭ 6,458 (+22168.97%)
ChroniclerScala toolchain for InfluxDB
Stars: ✭ 24 (-17.24%)
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+2486.21%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+2468.97%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-58.62%)
SparkctrCTR prediction model based on spark(LR, GBDT, DNN)
Stars: ✭ 740 (+2451.72%)
Cdhprojecthadoop各组件使用,持续更新
Stars: ✭ 733 (+2427.59%)
DigitrecognizerJava Convolutional Neural Network example for Hand Writing Digit Recognition
Stars: ✭ 23 (-20.69%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+2410.34%)
ScroogeA Thrift parser/generator
Stars: ✭ 724 (+2396.55%)
HeraclesHigh performance HBase / Spark SQL engine
Stars: ✭ 27 (-6.9%)
FlintA Time Series Library for Apache Spark
Stars: ✭ 878 (+2927.59%)
MlfeatureFeature engineering toolkit for Spark MLlib.
Stars: ✭ 12 (-58.62%)
Cluster PackA library on top of either pex or conda-pack to make your Python code easily available on a cluster
Stars: ✭ 23 (-20.69%)
FramelessExpressive types for Spark.
Stars: ✭ 717 (+2372.41%)
HailScalable genomic data analysis.
Stars: ✭ 706 (+2334.48%)