spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-67.27%)
Book本项目收藏这些年来看过或者听过的一些不错的书籍,在整理文件时看见这些,发现删掉有点可惜,放着又太浪费空间,本着分享的原则,就把它们共享出来,一方面给需要的读者提供这些书籍,另一方面也是一种像知识库的积累吧
Stars: ✭ 47 (-83.09%)
blogblog entries
Stars: ✭ 39 (-85.97%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-6.47%)
mriyaReal-time ETL developed by Flink, data from MySQL to Greenplum. Use canal to parse the MySQL binlog, put it into kafka, use Flink to consume kafka and assemble the data into Greenplum, and more data sources and target sources will be added in the future.
Stars: ✭ 65 (-76.62%)
nvimhost-scala♦️ nvim host plugin provider and API client library in Scala
Stars: ✭ 19 (-93.17%)
np-flinkflink详细学习实践
Stars: ✭ 26 (-90.65%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-91.01%)
yuzhouwanCode Library for My Blog
Stars: ✭ 39 (-85.97%)
Alchemy给flink开发的web系统。支持页面上定义udf,进行sql和jar任务的提交;支持source、sink、job的管理;可以管理openshift上的flink集群
Stars: ✭ 264 (-5.04%)
spark-utillow-level helpers for Apache Spark libraries and tests
Stars: ✭ 16 (-94.24%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-94.96%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-65.83%)
typebusFramework for building distributed microserviceies in scala with akka-streams and kafka
Stars: ✭ 14 (-94.96%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-90.65%)
SuccinctEnabling queries on compressed data.
Stars: ✭ 257 (-7.55%)
swordfishOpen-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-87.41%)
smolderHL7 Apache Spark Datasource
Stars: ✭ 33 (-88.13%)
Spark-ArResources for Spark AR
Stars: ✭ 43 (-84.53%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-91.01%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (-34.89%)
spark-stringmetricSpark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (-81.65%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-2.16%)
HelkThe Hunting ELK
Stars: ✭ 3,097 (+1014.03%)
ReactiveReactive: Examples of the most famous reactive libraries that you can find in the market.
Stars: ✭ 256 (-7.91%)
daf-kyloKylo integration with PDND (previously DAF).
Stars: ✭ 20 (-92.81%)
spark-demosCollection of different demo applications using Apache Spark
Stars: ✭ 15 (-94.6%)
visualize-data-with-pythonA Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (-78.42%)
litemall-dw基于开源Litemall电商项目的大数据项目,包含前端埋点(openresty+lua)、后端埋点;数据仓库(五层)、实时计算和用户画像。大数据平台采用CDH6.3.2(已使用vagrant+ansible脚本化),同时也包含了Azkaban的workflow。
Stars: ✭ 36 (-87.05%)
TheAkkaWayAkka Chinese Book / What should be included in it?
Stars: ✭ 19 (-93.17%)
LarkMidTableLarkMidTable 是一站式开源的数据中台,实现中台的 基础建设,数据治理,数据开发,监控告警,数据服务,数据的可视化,实现高效赋能数据前台并提供数据服务的产品。
Stars: ✭ 873 (+214.03%)
atomic-storeAtomic event store for Scala/Akka
Stars: ✭ 17 (-93.88%)
dllibdllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (-88.49%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-60.07%)
icicleIcicle Streaming Query Language
Stars: ✭ 16 (-94.24%)
powerapi-scalaPowerAPI is a middleware toolkit for building software-defined power meters
Stars: ✭ 70 (-74.82%)
AusweisBotTelegram bot to generate self-authorizations for moving around during covid-19 pandemic in France
Stars: ✭ 13 (-95.32%)
2018-flink-forward-chinaFlink Forward China 2018 第一届记录,视频记录 | 文档记录 | 不仅仅是流计算 | More than streaming
Stars: ✭ 25 (-91.01%)
Big Data Rosetta CodeCode snippets for solving common big data problems in various platforms. Inspired by Rosetta Code
Stars: ✭ 254 (-8.63%)
spark-streaming-visualizeSimple demonstration of how to build a complex real time machine learning visualization tool.
Stars: ✭ 16 (-94.24%)
transitMassively real-time city transit streaming application
Stars: ✭ 20 (-92.81%)
fb scraperFBLYZE is a Facebook scraping system and analysis system.
Stars: ✭ 61 (-78.06%)
tpch-sparkTPC-H queries in Apache Spark SQL using native DataFrames API
Stars: ✭ 63 (-77.34%)
FlinkTutorialFlinkTutorial 专注大数据Flink流试处理技术。从基础入门、概念、原理、实战、性能调优、源码解析等内容,使用Java开发,同时含有Scala部分核心代码。欢迎关注我的博客及github。
Stars: ✭ 46 (-83.45%)
richflowA Node.js and JavaScript synchronous data pipeline processing, data sharing and stream processing library. Actionable & Transformable Pipeline data processing.
Stars: ✭ 17 (-93.88%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+784.53%)
akka-periscopeAkka plugin to collect various data about actors
Stars: ✭ 16 (-94.24%)