简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-78.33%)

Mutual labels: big-data, bigdata

NiFi-Rule-engine-processor

Drools processor for Apache NiFi

Stars: ✭ 34 (-43.33%)

Mutual labels: big-data, bigdata

Hadoop For Geoevent

ArcGIS GeoEvent Server sample Hadoop connector for storing GeoEvents in HDFS.

Stars: ✭ 5 (-91.67%)

Mutual labels: big-data, bigdata

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (+2130%)

Mutual labels: big-data, bigdata

twitter-archive-reader

Full featured TypeScript Twitter archive reader and browser

Stars: ✭ 43 (-28.33%)

Mutual labels: big-data, bigdata

awesome-coder-resources

编程路上加油站！------【持续更新中...欢迎star,欢迎常回来看看......】【内容：编程/学习/阅读资源，开源项目,面试题,网站,书,博客,教程等等】

Stars: ✭ 54 (-10%)

Mutual labels: big-data, bigdata

scarf

Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.

Stars: ✭ 54 (-10%)

Mutual labels: big-data

spark-records

Bulletproof Apache Spark jobs with fast root cause analysis of failures.

Stars: ✭ 67 (+11.67%)

Mutual labels: big-data

nebula

A distributed block-based data storage and compute engine

Stars: ✭ 127 (+111.67%)

Mutual labels: big-data

talks

Talks, presentations, workshops.

Stars: ✭ 28 (-53.33%)

Mutual labels: workshops

columnify

Make record oriented data to columnar format.

Stars: ✭ 28 (-53.33%)

Mutual labels: bigdata

beekeeper

Service for automatically managing and cleaning up unreferenced data

Stars: ✭ 43 (-28.33%)

Mutual labels: big-data

datacatalog-tag-manager

Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format

Stars: ✭ 17 (-71.67%)

Mutual labels: bigdata

Notes

This is a learning note | Java基础，JVM，源码，大数据，面经

Stars: ✭ 69 (+15%)

Mutual labels: bigdata

sparkucx

A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer

Stars: ✭ 32 (-46.67%)

Mutual labels: big-data

RemoteShuffleService

Celeborn provides an elastic and high-performance service for shuffle and spilled data.

Stars: ✭ 262 (+336.67%)

Mutual labels: big-data

IoT-system-PLC-data-to-InfluxDB

This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.

Stars: ✭ 26 (-56.67%)

Mutual labels: big-data

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+3833.33%)

Mutual labels: big-data

insightedge

InsightEdge Core

Stars: ✭ 22 (-63.33%)

Mutual labels: big-data

hadoopoffice

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

Stars: ✭ 56 (-6.67%)

Mutual labels: bigdata

terraform-aws-kinesis-firehose

This code creates a Kinesis Firehose in AWS to send CloudWatch log data to S3.

Stars: ✭ 25 (-58.33%)

Mutual labels: big-data

workshops

Bioconnector Workshops

Stars: ✭ 24 (-60%)

Mutual labels: workshops

coolplayflink

Flink: Stateful Computations over Data Streams

Stars: ✭ 14 (-76.67%)

Mutual labels: bigdata

spark-root

Apache Spark Data Source for ROOT File Format

Stars: ✭ 28 (-53.33%)

Mutual labels: big-data

dxram

A distributed in-memory key-value storage for billions of small objects.

Stars: ✭ 25 (-58.33%)

Mutual labels: big-data

cloudberry

Big Data Visualization

Stars: ✭ 89 (+48.33%)

Mutual labels: big-data

siembol

An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.

Stars: ✭ 153 (+155%)

Mutual labels: big-data

nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability

Stars: ✭ 8,196 (+13560%)

Mutual labels: big-data

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Stars: ✭ 1,173 (+1855%)

Mutual labels: big-data

rastercube

rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)

Stars: ✭ 15 (-75%)

Mutual labels: big-data

Real Time Social Media Mining

DevOps pipeline for Real Time Social/Web Mining

Stars: ✭ 22 (-63.33%)

Mutual labels: big-data

GDLibrary

Matlab library for gradient descent algorithms: Version 1.0.1

Stars: ✭ 50 (-16.67%)

Mutual labels: big-data

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-71.67%)

Mutual labels: big-data

LoL-Match-Prediction

Win probability predictions for League of Legends matches using neural networks

Stars: ✭ 34 (-43.33%)

Mutual labels: big-data

incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Stars: ✭ 117 (+95%)

Mutual labels: big-data

learning-spark

Tidy up Spark and Hadoop tutorials.

Stars: ✭ 28 (-53.33%)

Mutual labels: bigdata

airavata-django-portal

Mirror of Apache Airavata Django Portal

Stars: ✭ 20 (-66.67%)

Mutual labels: big-data

lcbo-api

A crawler and API server for Liquor Control Board of Ontario retail data

Stars: ✭ 152 (+153.33%)

Mutual labels: big-data

airavata-php-gateway

Mirror of Apache Airavata PHP Gateway

Stars: ✭ 15 (-75%)

Mutual labels: big-data

datalake-etl-pipeline

Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations

Stars: ✭ 39 (-35%)

Mutual labels: big-data

hnn

The Human Neocortical Neurosolver (HNN) is a software tool that gives researchers/clinicians the ability to develop/test hypotheses on circuit mechanisms underlying EEG/MEG data.

Stars: ✭ 62 (+3.33%)

Mutual labels: neuronal-network

1-60 of 561 similar projects

›

next*5