Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-98.78%)
HelicalinsightHelical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Stars: ✭ 214 (-96.12%)
Couchdb DockerSemi-official Apache CouchDB Docker images
Stars: ✭ 194 (-96.48%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (-86.49%)
TraildbTrailDB is an efficient tool for storing and querying series of events
Stars: ✭ 1,029 (-81.34%)
SparkjniA heterogeneous Apache Spark framework.
Stars: ✭ 11 (-99.8%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-99%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (-84.45%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (-90.8%)
ArangojsThe official ArangoDB JavaScript driver.
Stars: ✭ 503 (-90.88%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-98.48%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-98.57%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-90.95%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (-78.32%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-70.22%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (-53.27%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-97.48%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (-97.71%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-97.22%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-97.04%)
PhoenixMirror of Apache Phoenix
Stars: ✭ 867 (-84.27%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (-95.54%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-96.08%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-96.1%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-99.64%)
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-99.47%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-99.76%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (-47.42%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-95.34%)
Bedquilt CoreA JSON document store on PostgreSQL
Stars: ✭ 256 (-95.36%)
CloudflowCloudflow enables users to quickly develop, orchestrate, and operate distributed streaming applications on Kubernetes.
Stars: ✭ 278 (-94.96%)
CrateCrateDB is a distributed SQL database that makes it simple to store and analyze
massive amounts of data in real-time.
Stars: ✭ 3,254 (-40.98%)
ConcourseDistributed database warehouse for transactions, search and analytics across time.
Stars: ✭ 310 (-94.38%)
Sparkstreaming💥 🚀 封装sparkstreaming动态调节batch time(有数据就执行计算);🚀 支持运行过程中增删topic;🚀 封装sparkstreaming 1.6 - kafka 010 用以支持 SSL。
Stars: ✭ 179 (-96.75%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-93.43%)
SylphStream computing platform for bigdata
Stars: ✭ 362 (-93.43%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (-30.84%)
KacheA simple in memory cache written using go
Stars: ✭ 349 (-93.67%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (-92.94%)
IgniteApache Ignite
Stars: ✭ 4,027 (-26.95%)
Bdp Dataplatform大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (-91.73%)
RavendbACID Document Database
Stars: ✭ 2,870 (-47.94%)
Rxdb🔄 A client side, offline-first, reactive database for JavaScript Applications
Stars: ✭ 16,670 (+202.38%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-96.97%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (-92.38%)
SleekdbPure PHP NoSQL database with no dependency. Flat file, JSON based document database.
Stars: ✭ 450 (-91.84%)
OrientdbOrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries. OrientDB Community Edition is Open Source using a liberal Apache 2 license.
Stars: ✭ 4,394 (-20.3%)