SylphStream computing platform for bigdata
Stars: ✭ 362 (-97.96%)
ZeppelinWeb-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Stars: ✭ 5,513 (-68.99%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-99.62%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-99.89%)
alluxio-pyAlluxio Python client - Access Any Data Source with Python
Stars: ✭ 18 (-99.9%)
bigstatsrR package for statistical tools with big matrices stored on disk.
Stars: ✭ 139 (-99.22%)
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (-99.8%)
ibmpairsopen source tools for interaction with IBM PAIRS:
Stars: ✭ 23 (-99.87%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (-99.35%)
big-data-liteSamples to the Oracle Big Data Lite VM
Stars: ✭ 41 (-99.77%)
vxqueryMirror of Apache VXQuery
Stars: ✭ 19 (-99.89%)
pipelineOONI data processing pipeline
Stars: ✭ 36 (-99.8%)
flink-k8s-operatorAn example of building kubernetes operator (Flink) using Abstract operator's framework
Stars: ✭ 28 (-99.84%)
Alchemy给flink开发的web系统。支持页面上定义udf,进行sql和jar任务的提交;支持source、sink、job的管理;可以管理openshift上的flink集群
Stars: ✭ 264 (-98.52%)
pyparEfficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
Stars: ✭ 66 (-99.63%)
LarkMidTableLarkMidTable 是一站式开源的数据中台,实现中台的 基础建设,数据治理,数据开发,监控告警,数据服务,数据的可视化,实现高效赋能数据前台并提供数据服务的产品。
Stars: ✭ 873 (-95.09%)
hyper-enginePython library for Bayesian hyper-parameters optimization
Stars: ✭ 80 (-99.55%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-99.49%)
mriyaReal-time ETL developed by Flink, data from MySQL to Greenplum. Use canal to parse the MySQL binlog, put it into kafka, use Flink to consume kafka and assemble the data into Greenplum, and more data sources and target sources will be added in the future.
Stars: ✭ 65 (-99.63%)
falconMirror of Apache Falcon
Stars: ✭ 95 (-99.47%)
FlinkTutorialFlinkTutorial 专注大数据Flink流试处理技术。从基础入门、概念、原理、实战、性能调优、源码解析等内容,使用Java开发,同时含有Scala部分核心代码。欢迎关注我的博客及github。
Stars: ✭ 46 (-99.74%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-99.92%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-99.47%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-99.38%)
Knowage ServerKnowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Stars: ✭ 276 (-98.45%)
hotmapWebGL Heatmap Viewer for Big Data and Bioinformatics
Stars: ✭ 13 (-99.93%)
egisEgis - a handy Ruby interface for AWS Athena
Stars: ✭ 38 (-99.79%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-98.55%)
pytorch kmeansImplementation of the k-means algorithm in PyTorch that works for large datasets
Stars: ✭ 38 (-99.79%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-99.93%)
big-sorterJava library that sorts very large files of records by splitting into smaller sorted files and merging
Stars: ✭ 49 (-99.72%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (-74.24%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-99.81%)
lensMirror of Apache Lens
Stars: ✭ 57 (-99.68%)
bftkvA distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (-99.85%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-99.89%)
2018-flink-forward-chinaFlink Forward China 2018 第一届记录,视频记录 | 文档记录 | 不仅仅是流计算 | More than streaming
Stars: ✭ 25 (-99.86%)
flink-crawlerContinuous scalable web crawler built on top of Flink and crawler-commons
Stars: ✭ 48 (-99.73%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (-98.18%)
fb scraperFBLYZE is a Facebook scraping system and analysis system.
Stars: ✭ 61 (-99.66%)
mmtf-workshop-2018Structural Bioinformatics Training Workshop & Hackathon 2018
Stars: ✭ 50 (-99.72%)
couchdb-mangoMirror of Apache CouchDB Mango
Stars: ✭ 34 (-99.81%)
np-flinkflink详细学习实践
Stars: ✭ 26 (-99.85%)
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-98.41%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-98.44%)