Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-51.85%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-72.71%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-82.07%)
Spark.NET for Apache® Spark™ makes Apache Spark™ easily accessible to .NET developers.
Stars: ✭ 1,721 (+235.48%)
StreamlineStreamLine - Streaming Analytics
Stars: ✭ 151 (-70.57%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-57.89%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-19.49%)
Flink Learningflink learning blog. http://www.54tianzhisheng.cn/ 含 Flink 入门、概念、原理、实战、性能调优、源码解析等内容。涉及 Flink Connector、Metrics、Library、DataStream API、Table API & SQL 等内容的学习案例,还有 Flink 落地应用的大型项目案例(PVUV、日志存储、百亿数据实时去重、监控告警)分享。欢迎大家支持我的专栏《大数据实时计算引擎 Flink 实战与性能优化》
Stars: ✭ 11,378 (+2117.93%)
prostoProsto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (-89.47%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+67.06%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-81.09%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-70.76%)
Smart openUtils for streaming large files (S3, HDFS, gzip, bz2...)
Stars: ✭ 2,306 (+349.51%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+81.09%)
HydraA real-time data replication platform that "unbundles" the receiving, transforming, and transport of data streams.
Stars: ✭ 68 (-86.74%)
God Of Bigdata专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+1071.15%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-51.46%)
Stepfunctions2processingConfiguration with AWS step functions and lambdas which initiates processing from activity state
Stars: ✭ 90 (-82.46%)
Iopipe Js CoreObserve and develop serverless apps with confidence on AWS Lambda with Tracing, Metrics, Profiling, Monitoring, and more.
Stars: ✭ 123 (-76.02%)
WhatsmarsJava生态研究(Spring Boot + Redis + Dubbo + RocketMQ + Elasticsearch)🔥🔥🔥🔥🔥
Stars: ✭ 1,389 (+170.76%)
StoragetapperStorageTapper is a scalable realtime MySQL change data streaming, logical backup and logical replication service
Stars: ✭ 232 (-54.78%)
Serverless AnalyticsTrack website visitors with Serverless Analytics using Kinesis, Lambda, and TypeScript.
Stars: ✭ 219 (-57.31%)
FlogoProject Flogo is an open source ecosystem of opinionated event-driven capabilities to simplify building efficient & modern serverless functions, microservices & edge apps.
Stars: ✭ 1,891 (+268.62%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-96.3%)
trafficMassively real-time traffic streaming application
Stars: ✭ 25 (-95.13%)
TributaryStreaming reactive and dataflow graphs in Python
Stars: ✭ 231 (-54.97%)
matrixoneHyperconverged cloud-edge native database
Stars: ✭ 1,057 (+106.04%)
wink-statisticsFast & numerically stable statistical analysis
Stars: ✭ 36 (-92.98%)
duckdbDuckDB is an in-process SQL OLAP Database Management System
Stars: ✭ 4,707 (+817.54%)
fastdata-clusterFast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (-96.1%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-81.48%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-97.47%)
MStreamAnomaly Detection on Time-Evolving Streams in Real-time. Detecting intrusions (DoS and DDoS attacks), frauds, fake rating anomalies.
Stars: ✭ 68 (-86.74%)
transform-hubFlexible and efficient data processing engine and an evolution of the popular Scramjet Framework based on node.js. Our Transform Hub was designed specifically for data processing and has its own unique algorithms included.
Stars: ✭ 38 (-92.59%)
TogetherStreamA social and synchronized streaming experience
Stars: ✭ 16 (-96.88%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-87.33%)
bigkubeMinikube for big data with Scala and Spark
Stars: ✭ 16 (-96.88%)
MaterializeMaterialize lets you ask questions of your live data, which it answers and then maintains for you as your data continue to change. The moment you need a refreshed answer, you can get it in milliseconds. Materialize is designed to help you interactively explore your streaming data, perform data warehousing analytics against live relational data, or just increase the freshness and reduce the load of your dashboard and monitoring tasks.
Stars: ✭ 3,341 (+551.27%)
transitMassively real-time city transit streaming application
Stars: ✭ 20 (-96.1%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-97.27%)
CdapAn open source framework for building data analytic applications.
Stars: ✭ 509 (-0.78%)
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (-40.94%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+699.03%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+660.82%)
CrateCrateDB is a distributed SQL database that makes it simple to store and analyze
massive amounts of data in real-time.
Stars: ✭ 3,254 (+534.31%)
Coolplayspark酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+546.78%)
PerspectiveA data visualization and analytics component, especially well-suited for large and/or streaming datasets.
Stars: ✭ 3,989 (+677.58%)