leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (+160%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (+180%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (+2240%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+2900%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+580%)
twitter-for-geoeventArcGIS GeoEvent Server sample Twitter connectors for sending and receiving tweets.
Stars: ✭ 21 (+320%)
CamusMirror of Linkedin's Camus
Stars: ✭ 81 (+1520%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+17040%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+120060%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+4200%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (+200%)
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (+1020%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (+280%)
aaocp一个对用户行为日志进行分析的大数据项目
Stars: ✭ 53 (+960%)
datasqueezeHadoop utility to compact small files
Stars: ✭ 18 (+260%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+14800%)
taller SparkRTaller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (+140%)
py-hdfs-mountMount HDFS with fuse, works with kerberos!
Stars: ✭ 13 (+160%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (+300%)
fsbrowserFast desktop client for Hadoop Distributed File System
Stars: ✭ 27 (+440%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+6360%)
skeinA tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+2460%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (+540%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (+1100%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+680%)
ros hadoopHadoop splittable InputFormat for ROS. Process rosbag with Hadoop Spark and other HDFS compatible systems.
Stars: ✭ 92 (+1740%)
clusterdockclusterdock is a framework for creating Docker-based container clusters
Stars: ✭ 26 (+420%)
flokkrDocumentation placeholder and utilities for all the other containers.
Stars: ✭ 30 (+500%)
big-data-liteSamples to the Oracle Big Data Lite VM
Stars: ✭ 41 (+720%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (+680%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (+280%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+91520%)
CloudbreakA tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.
Stars: ✭ 301 (+5920%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+6140%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+2120%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (+6020%)
TezApache Tez
Stars: ✭ 313 (+6160%)
HiveApache Hive
Stars: ✭ 4,031 (+80520%)
VespaThe open big data serving engine. https://vespa.ai
Stars: ✭ 3,747 (+74840%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+76160%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (+7680%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (+6500%)
IgniteApache Ignite
Stars: ✭ 4,027 (+80440%)
Graphql WsCoherent, zero-dependency, lazy, simple, GraphQL over WebSocket Protocol compliant server and client.
Stars: ✭ 398 (+7860%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+8420%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+8020%)
Circosjsd3 library to build circular graphs
Stars: ✭ 436 (+8620%)