简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-77.59%)

Mutual labels: spark, hdfs

stock-market-scraper

Scraps historical stock market data from Yahoo Finance (https://finance.yahoo.com/)

Stars: ✭ 110 (+89.66%)

Mutual labels: query, csv

confluent-spark-avro

Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.

Stars: ✭ 18 (-68.97%)

Mutual labels: spark, avro

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (-48.28%)

Mutual labels: csv, parquet

Json2csv

command line tool to convert json to csv

Stars: ✭ 742 (+1179.31%)

Mutual labels: json, csv

Dataproofer

A proofreader for your data

Stars: ✭ 628 (+982.76%)

Mutual labels: csv, data-science

Nano Sql

Universal database layer for the client, server & mobile devices. It's like Lego for databases.

Stars: ✭ 717 (+1136.21%)

Mutual labels: json, csv

Dasel

Query, update and convert data structures from the command line. Comparable to jq/yq but supports JSON, TOML, YAML, XML and CSV with zero runtime dependencies.

Stars: ✭ 759 (+1208.62%)

Mutual labels: json, query

Gcs Tools

GCS support for avro-tools, parquet-tools and protobuf

Stars: ✭ 57 (-1.72%)

Mutual labels: avro, parquet

Quicklib

Quick development library (AutoMapper, LinQ, IOC Dependency Injection, MemoryCache, Scheduled tasks, Config, Serializers, etc) with crossplatform support for Delphi/Firemonkey (Windows,Linux,OSX/IOS/Android) and freepascal (Windows/Linux).

Stars: ✭ 274 (+372.41%)

Mutual labels: azure, json

Loaders.gl

Loaders for big data visualization. Website:

Stars: ✭ 272 (+368.97%)

Mutual labels: json, csv

swiss-army knife for data

Stars: ✭ 275 (+374.14%)

Mutual labels: json, csv

Divolte Collector

Stars: ✭ 264 (+355.17%)

Mutual labels: avro, hdfs

Http Rpc

Lightweight REST for Java

Stars: ✭ 298 (+413.79%)

Mutual labels: json, csv

Preql

An interpreted relational query language that compiles to SQL.

Stars: ✭ 257 (+343.1%)

Mutual labels: data-science, query

Sqawk

Like Awk but with SQL and table joins

Stars: ✭ 263 (+353.45%)

Mutual labels: json, csv

Csv Parser

A modern C++ library for reading, writing, and analyzing CSV (and similar) files.

Stars: ✭ 359 (+518.97%)

Mutual labels: json, csv

Nodb

NoDB isn't a database.. but it sort of looks like one.

Stars: ✭ 353 (+508.62%)

Mutual labels: s3, json

Artificial Adversary

🗣️ Tool to generate adversarial text examples and test machine learning models against them

Stars: ✭ 348 (+500%)

Mutual labels: data-science, text

Stream Parser

⚡ PHP7 / Laravel Multi-format Streaming Parser

Stars: ✭ 391 (+574.14%)

Mutual labels: json, csv

Kalulu

Uganda Elections Tools and Resources

Stars: ✭ 24 (-58.62%)

Mutual labels: json, csv

Tiledb Vcf

Efficient variant-call data storage and retrieval library using the TileDB storage library.

Stars: ✭ 26 (-55.17%)

Mutual labels: data-science, spark

Bigdata Interview

🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结

Stars: ✭ 857 (+1377.59%)

Mutual labels: spark, hdfs

Rio

A Swiss-Army Knife for Data I/O

Stars: ✭ 467 (+705.17%)

Mutual labels: csv, data-science

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+37913.79%)

Mutual labels: data-science, spark

Datasette

An open source multi-tool for exploring and publishing data

Stars: ✭ 5,640 (+9624.14%)

Mutual labels: json, csv

Python Ml Course

Curso de Introducción a Machine Learning con Python

Stars: ✭ 442 (+662.07%)

Mutual labels: data-science, svm

Trdsql

CLI tool that can execute SQL queries on CSV, LTSV, JSON and TBLN. Can output to various formats.

Stars: ✭ 593 (+922.41%)

Mutual labels: json, csv

Sqlitebiter

A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.

Stars: ✭ 601 (+936.21%)

Mutual labels: json, csv

God Of Bigdata

专注大数据学习面试，大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...

Stars: ✭ 6,008 (+10258.62%)

Mutual labels: spark, hdfs

Pyspark Example Project

Example project implementing best practices for PySpark ETL jobs and applications.

Stars: ✭ 633 (+991.38%)

Mutual labels: data-science, spark

Fsharp.data

F# Data: Library for Data Access

Stars: ✭ 631 (+987.93%)

Mutual labels: json, csv

Pmacct

pmacct is a small set of multi-purpose passive network monitoring tools [NetFlow IPFIX sFlow libpcap BGP BMP RPKI IGP Streaming Telemetry].

Stars: ✭ 677 (+1067.24%)

Mutual labels: json, avro

Boltons

🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Stars: ✭ 5,671 (+9677.59%)

Mutual labels: json, data-science

Rows

A common, beautiful interface to tabular data, no matter the format

Stars: ✭ 739 (+1174.14%)

Mutual labels: csv, data-science

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+1155.17%)

Mutual labels: spark, avro

Portabletext

Portable Text is a JSON based rich text specification for modern content editing platforms.

Stars: ✭ 759 (+1208.62%)

Mutual labels: json, text

Pytablewriter

pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.

Stars: ✭ 422 (+627.59%)

Mutual labels: json, csv

Cluster Pack

A library on top of either pex or conda-pack to make your Python code easily available on a cluster

Stars: ✭ 23 (-60.34%)

Mutual labels: s3, hdfs

Yandex Big Data Engineering

Stars: ✭ 17 (-70.69%)

Mutual labels: spark, hdfs

Parquet Generator

Parquet file generator

Stars: ✭ 16 (-72.41%)

Mutual labels: spark, parquet

Data Forge Ts

The JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.

Stars: ✭ 967 (+1567.24%)

Mutual labels: json, csv

S3proxy

Access other storage backends via the S3 API

Stars: ✭ 952 (+1541.38%)

Mutual labels: azure, s3

Clevercsv

CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

Stars: ✭ 887 (+1429.31%)

Mutual labels: csv, data-science

Learning Spark

零基础学习spark，大数据学习

Stars: ✭ 37 (-36.21%)

Mutual labels: spark, hdfs

Snappydata

Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster

Stars: ✭ 995 (+1615.52%)

Mutual labels: spark, scale

Optimus

🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark