GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+1561.54%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (+7.69%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (+1261.54%)
Esper TvEsper instance for TV news analysis
Stars: ✭ 37 (+184.62%)
TeraAn Internet-Scale Database.
Stars: ✭ 1,846 (+14100%)
bftkvA distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (+107.69%)
DvidDistributed, Versioned, Image-oriented Dataservice
Stars: ✭ 174 (+1238.46%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+12530.77%)
JanusgraphJanusGraph: an open-source, distributed graph database
Stars: ✭ 4,277 (+32800%)
yildiz🦄🌟 Graph Database layer on top of Google Bigtable
Stars: ✭ 24 (+84.62%)
RedisliteRedis in a python module.
Stars: ✭ 464 (+3469.23%)
emulator-toolsGoogle Cloud BigTable and PubSub emulator tools to make development a breeze
Stars: ✭ 16 (+23.08%)
vxqueryMirror of Apache VXQuery
Stars: ✭ 19 (+46.15%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (+0%)
alluxio-pyAlluxio Python client - Access Any Data Source with Python
Stars: ✭ 18 (+38.46%)
lensMirror of Apache Lens
Stars: ✭ 57 (+338.46%)
elaraElara DB is an easy to use, lightweight key-value database that can also be used as a fast in-memory cache. Manipulate data structures in-memory, encrypt database files and export data. 🎯
Stars: ✭ 93 (+615.38%)
pyparEfficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
Stars: ✭ 66 (+407.69%)
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (+176.92%)
serverThe ViUR application development framework - legacy version 2.x for Python 2.7
Stars: ✭ 12 (-7.69%)
kubernetes-vault-examplePlaceholder for training material related to TA usage of Vault for securing Kubernetes apps.
Stars: ✭ 16 (+23.08%)
arc gcsProvides an Arc backend for Google Cloud Storage
Stars: ✭ 48 (+269.23%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+630.77%)
ibmpairsopen source tools for interaction with IBM PAIRS:
Stars: ✭ 23 (+76.92%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (+169.23%)
tempdbKey-value store for temporary items 📝
Stars: ✭ 16 (+23.08%)
kevaLow-latency in-memory key-value store, Redis drop-in alternative
Stars: ✭ 76 (+484.62%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+600%)
hotmapWebGL Heatmap Viewer for Big Data and Bioinformatics
Stars: ✭ 13 (+0%)
pipelineOONI data processing pipeline
Stars: ✭ 36 (+176.92%)
egisEgis - a handy Ruby interface for AWS Athena
Stars: ✭ 38 (+192.31%)
chaparA framework for verification of causal consistency for distributed key-value stores and their clients in Coq [maintainer=@palmskog]
Stars: ✭ 29 (+123.08%)
b52b52 is a fast experimental Key/value database. With support for the memcache protocol.
Stars: ✭ 20 (+53.85%)
pytorch kmeansImplementation of the k-means algorithm in PyTorch that works for large datasets
Stars: ✭ 38 (+192.31%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+753.85%)
big-sorterJava library that sorts very large files of records by splitting into smaller sorted files and merging
Stars: ✭ 49 (+276.92%)
nodejs-talentNode.js client for Google Cloud Talent Solutions. Transform your job search and candidate matching capabilities with Cloud Talent Solution.
Stars: ✭ 21 (+61.54%)
asdf-gcloud☁️ GCloud CLI (Google Cloud SDK) plugin for asdf version manager. Pin gcloud versions for each project!
Stars: ✭ 24 (+84.62%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+784.62%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+161.54%)
hyper-enginePython library for Bayesian hyper-parameters optimization
Stars: ✭ 80 (+515.38%)
secrets-initminimalistic init system for containers with AWS/GCP secrets support
Stars: ✭ 114 (+776.92%)
restmeTemplate to bootstrap a fully functional, multi-region, REST service on GCP with a developer release pipeline.
Stars: ✭ 19 (+46.15%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (+200%)
big-data-liteSamples to the Oracle Big Data Lite VM
Stars: ✭ 41 (+215.38%)
irisDistributed streaming key-value storage
Stars: ✭ 55 (+323.08%)