All Projects → SharpRay → spark-druid-connector

SharpRay / spark-druid-connector

Licence: other
A library for querying Druid data sources with Apache Spark

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to spark-druid-connector

kafka-connect-http
Kafka Connect connector that enables Change Data Capture from JSON/HTTP APIs into Kafka.
Stars: ✭ 81 (+305%)
Mutual labels:  connector
loopback-connector-cassandra
Cassandra connector for the LoopBack framework.
Stars: ✭ 13 (-35%)
Mutual labels:  connector
twitter-for-geoevent
ArcGIS GeoEvent Server sample Twitter connectors for sending and receiving tweets.
Stars: ✭ 21 (+5%)
Mutual labels:  connector
tarantool.ex
Tarantool client library for Elixir projects
Stars: ✭ 26 (+30%)
Mutual labels:  connector
loopback-connector-arangodb
LoopBack connector for ArangoDB
Stars: ✭ 20 (+0%)
Mutual labels:  connector
spring-boot-examples
本仓库为《Spring Boot 系列文章》代码仓库,欢迎点赞、收藏。
Stars: ✭ 52 (+160%)
Mutual labels:  druid
FiveSecondRule
This is an addon for World of Warcraft Classic The purpose of this addon is to track the so-called "5-second-rule" (5SR), which refers to the time needed to elapse after spending mana, for mana regen to resume.
Stars: ✭ 40 (+100%)
Mutual labels:  druid
cc-s
一个基于spring boot、druid、mybatis、mysql的后端基础
Stars: ✭ 22 (+10%)
Mutual labels:  druid
RuoYi-fast
🎉 (RuoYi)官方仓库 基于SpringBoot的权限管理系统 易读易懂、界面简洁美观。 核心技术采用Spring、MyBatis、Shiro没有任何其它重度依赖。直接运行即可用
Stars: ✭ 117 (+485%)
Mutual labels:  druid
loopback-connector-firestore
Firebase Firestore connector for the LoopBack framework.
Stars: ✭ 32 (+60%)
Mutual labels:  connector
tongyimall
高仿小米商城用户端,是Vue + SpringBoot的前后端分离项目,包括首页门户、商品分类、首页轮播、商品展示、购物车、地址管理等部分。管理端在另一个仓库。
Stars: ✭ 55 (+175%)
Mutual labels:  druid
ilp-connector
Reference implementation of an Interledger connector.
Stars: ✭ 131 (+555%)
Mutual labels:  connector
springboot-chapter
🚀Spring Boot 2.0基础教程。主流框架整合,实践学习案例。
Stars: ✭ 23 (+15%)
Mutual labels:  druid
spring-boot-demo
SpringBoot Demo.集成统一异常处理,Swagger,Druid,Mybatis,Redis,Mongo.
Stars: ✭ 21 (+5%)
Mutual labels:  druid
SpringBootIntegration
SpringBoot集成学习项目 SpringBoot Integration
Stars: ✭ 20 (+0%)
Mutual labels:  druid
spark-records
Bulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (+235%)
Mutual labels:  sparksql
deepl-api-connector
Connector library for deepl.com rest translation api
Stars: ✭ 12 (-40%)
Mutual labels:  connector
camunda-bpm-mail
Mail connectors for Camunda Platform 7
Stars: ✭ 64 (+220%)
Mutual labels:  connector
learn-java-demo
java学习demo
Stars: ✭ 17 (-15%)
Mutual labels:  druid
yiying-parent
在线电影,基于分布式微服务架构,技术架构有SpringBoot、SpringCoud、nacos、dubbo、mybatis-plus、Druid,采用前后端分离方式进行开发,实现自定义视频上传、解码、存储、点播
Stars: ✭ 48 (+140%)
Mutual labels:  druid

spark-druid-connector

A library for querying Druid data sources with Apache Spark.

Compatability

This libaray is compatable with Spark-2.x and Druid-0.9.0+

Usage

Compile

sbt clean assembly

Using with spark-shell

bin/spark-shell --jars spark-druid-connector-assembly-0.1.0-SNAPSHOT.jar

In spark-shell, a temp table could be created like this:

val df = spark.read.format("org.rzlabs.druid").
  option("druidDatasource", "ds1").
  option("zkHost", "localhost:2181").
  option("hyperUniqueColumnInfo", """[{"column":"city", "hllMetric": "unique_city"}]""").load
df.createOrReplaceTempView("ds")
spark.sql("select time, sum(event) from ds group by time").show

or you can create a hive table:

spark.sql("""
  create table ds1 using org.rzlabs.druid options (
    druidDatasource "ds1",
    zkHost "localhost:2181",
    hyperUniqueColumnInfo, "[{\"column\": \"city\", \"hllMetric\": \"unique_city\"}]"
  )
""")

Options

option required default value descrption
druidDatasource yes none data source name in Druid
zkHost no localhost zookeeper server Druid use, e.g., localhost:2181
zkSessionTimeout no 30000 zk server connection timeout
zkEnableCompression no true zk enbale compression or not
zkDruidPath no /druid The druid metadata root path in zk
zkQualifyDiscoveryNames no true
queryGranularity no all The query granularity of the Druid datasource
maxConnectionsPerRoute no 20 The max simultaneous live connections per Druid server
maxConnections no 100 The max simultaneous live connnections of the Druid cluster
loadMetadataFromAllSegments no true Fetch metadata from all available segments or not
debugTransformations no false Log debug informations about the transformations or not
timeZoneId no UTC
useV2GroupByEngine no false Use V2 groupby engine or not
useSmile no true Use smile binary format as the data format exchanged between client and Druid servers

Major features

Currently

  • Direct table creating in Spark without requiring of base table.
  • Support Aggregate and Project & Filter operators pushing down and transform to GROUPBY and SCAN query against Druid accordingly.
  • Support majority of primitive filter specs, aggregation specs and extraction functions.
  • Lightweight datasource metadata updating.

In the future

  • Support Join operator.
  • Support Limit and Having operators pushing down.
  • Suport more primitive specs and extraction functions.
  • Support more Druid query specs according to query details.
  • Suport datasource creating and metadata lookup.
  • ...
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].