All Projects → yahoo → Hive Funnel Udf

yahoo / Hive Funnel Udf

Licence: apache-2.0
Hive UDFs for funnel analysis

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Hive Funnel Udf

Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+6262.5%)
Mutual labels:  analytics, hadoop, hive
Addax
Addax is an open source universal ETL tool that supports most of those RDBMS and NoSQLs on the planet, helping you transfer data from any one place to another.
Stars: ✭ 615 (+754.17%)
Mutual labels:  hive, hadoop
swordfish
Open-source distribute workflow schedule tools, also support streaming task.
Stars: ✭ 35 (-51.39%)
Mutual labels:  hive, hadoop
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+404.17%)
Mutual labels:  analytics, hive
TitanDataOperationSystem
最好的大数据项目。《Titan数据运营系统》,本项目是一个全栈闭环系统,我们有用作数据可视化的web系统,然后用flume-kafaka-flume进行日志的读取,在hive设计数仓,编写spark代码进行数仓表之间的转化以及ads层表到mysql的迁移,使用azkaban进行定时任务的调度,使用技术:Java/Scala语言,Hadoop、Spark、Hive、Kafka、Flume、Azkaban、SpringBoot,Bootstrap, Echart等;
Stars: ✭ 62 (-13.89%)
Mutual labels:  hive, hadoop
GooglePlay-Web-Crawler
Mapreduce project by Hadoop, Nutch, AWS EMR, Pig, Tez, Hive
Stars: ✭ 18 (-75%)
Mutual labels:  hive, hadoop
BigData-News
基于Spark2.2新闻网大数据实时系统项目
Stars: ✭ 36 (-50%)
Mutual labels:  hive, hadoop
web-click-flow
网站点击流离线日志分析
Stars: ✭ 14 (-80.56%)
Mutual labels:  hive, hadoop
Hive
Apache Hive
Stars: ✭ 4,031 (+5498.61%)
Mutual labels:  hadoop, hive
Big data architect skills
一个大数据架构师应该掌握的技能
Stars: ✭ 400 (+455.56%)
Mutual labels:  analytics, hadoop
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+8244.44%)
Mutual labels:  hadoop, hive
EngineeringTeam
와이빅타 엔지니어링팀의 자료를 정리해두는 곳입니다.
Stars: ✭ 41 (-43.06%)
Mutual labels:  hive, hadoop
cloud
云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (-33.33%)
Mutual labels:  hive, hadoop
TIL
Today I Learned
Stars: ✭ 43 (-40.28%)
Mutual labels:  hive, hadoop
cobra-policytool
Manage Apache Atlas and Ranger configuration for your Hadoop environment.
Stars: ✭ 16 (-77.78%)
Mutual labels:  hive, hadoop
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+1034.72%)
Mutual labels:  hadoop, hive
aaocp
一个对用户行为日志进行分析的大数据项目
Stars: ✭ 53 (-26.39%)
Mutual labels:  hive, hadoop
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-66.67%)
Mutual labels:  hive, hadoop
Wedatasphere
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+416.67%)
Mutual labels:  hadoop, hive
Bigdata
💎🔥大数据学习笔记
Stars: ✭ 488 (+577.78%)
Mutual labels:  hadoop, hive

Hive Funnel Analysis UDFs

Build Status Coverage Status Apache License 2.0

Funnel analysis is a method for tracking user conversion rates across actions. This enables detection of actions causing high user fallout.

These Hive UDFs enables funnel analysis to be performed simply and easily on any Hive table.

Table of Contents

Requirements

Maven is required to build the funnel UDFs.

How to build

There is a provided Makefile with all the build targets.

Build JAR

make jar

This creates a funnel.jar in the target/ directory.

Register JAR with Hive

To use the funnel UDFs, you need to register it with Hive.

With temporary functions:

ADD JAR funnel.jar;
CREATE TEMPORARY FUNCTION funnel            AS 'com.yahoo.hive.udf.funnel.Funnel';
CREATE TEMPORARY FUNCTION funnel_merge      AS 'com.yahoo.hive.udf.funnel.Merge';
CREATE TEMPORARY FUNCTION funnel_conversion AS 'com.yahoo.hive.udf.funnel.Conversion';
CREATE TEMPORARY FUNCTION funnel_fallout    AS 'com.yahoo.hive.udf.funnel.Fallout';

With permenant functions you need to put the JAR on HDFS, and it will be registered with a database (you have to replace DATABASE and PATH_TO_JAR with your values):

CREATE FUNCTION DATABASE.funnel            AS 'com.yahoo.hive.udf.funnel.Funnel'  USING JAR 'hdfs:///PATH_TO_JAR/funnel.jar';
CREATE FUNCTION DATABASE.funnel_merge      AS 'com.yahoo.hive.udf.funnel.Merge'   USING JAR 'hdfs:///PATH_TO_JAR/funnel.jar';
CREATE FUNCTION DATABASE.funnel_conversion AS 'com.yahoo.hive.udf.funnel.Conversion' USING JAR 'hdfs:///PATH_TO_JAR/funnel.jar';
CREATE FUNCTION DATABASE.funnel_fallout    AS 'com.yahoo.hive.udf.funnel.Fallout' USING JAR 'hdfs:///PATH_TO_JAR/funnel.jar';

How to use

There are four funnel UDFs provided: funnel, funnel_merge, funnel_conversion, funnel_fallout.

The funnel UDF outputs an array of longs showing conversion rates across the provided funnel steps.

The funnel_merge UDF merges multiple arrays of longs by adding them together.

The funnel_conversion UDF takes a raw count funnel result and converts it to the conversion rate.

The funnel_fallout UDF takes a raw count funnel result and converts it to the fallout rate.

There is no need to sort the data on timestamp, the UDF will take care of it. If there is a collision in the timestamps, it then sorts on the action column.

funnel

funnel(action_column, timestamp_column, array(funnel_1_a, funnel_1_b), array(funnel_2), ...)

  • Builds a funnel report applied to the action_column, sorted by the timestamp_column.
  • The funnel steps are arrays of the same type as the action column. This allows for multiple matches to move to the next funnel.
    • For example, funnel_1 could be array('register_button', 'facebook_invite_register'). The funnel will match the first occurence of either of these actions and proceed to the next funnel.
    • Or, funnel_1 could just be array('register_button').
  • You can have an arbitrary number of funnels.
  • The timestamp_column can be of any comparable type (Strings, Integers, Dates, etc).

funnel_merge

funnel_merge(funnel_column)

  • Merges funnels. Use with funnel UDF.

funnel_conversion

funnel_conversion(funnel_column)

  • Converts the result of a funnel_merge to a conversion rate. Use with funnel and funnel_merge UDF.
  • For example, a result from funnel_merge could look like [245, 110, 54, 13]. This is result is in raw counts. If we pass this through funnel_conversion then it would look like [1.0, 0.44, 0.49, 0.24].

funnel_fallout

funnel_fallout(funnel_column)

  • Converts the result of a funnel_merge to a fallout rate. Use with funnel and funnel_merge UDF.
  • For example, a result from funnel_merge could look like [245, 110, 54, 13]. This is result is in raw counts. If we pass this through funnel_fallout then it would look like [0.0, 0.55, 0.50, 0.75].

Security

Older versions of Hive have known security issues. Keep the following issues in mind when deciding what Hive version to use when building the UDFs. Use the following steps to mitigate these issues, or update to Hive 2.3.4 to avoid all issues at once.

CVE-2018-11777

Description

In Apache Hive 2.3.3, 3.1.0 and earlier, local resources on HiveServer2 machines are not properly protected against malicious user if ranger, sentry or sql standard authorizer is not in use.

Resolution

Update pom.xml to use Hive 2.3.4.

CVE-2018-1284

Description

In Apache Hive 0.6.0 to 2.3.2, malicious user might use any xpath UDFs (xpath/xpath_string/xpath_boolean/xpath_number/xpath_double/xpath_float/xpath_long/xpath_int/xpath_short) to expose the content of a file on the machine running HiveServer2 owned by HiveServer2 user (usually hive) if hive.server2.enable.doAs=false.

Resolution

Update pom.xml to use Hive 2.3.3 or do not set hive.server2.enable.doAs to false.

CVE-2015-7521

Description

The authorization framework in Apache Hive 1.0.0, 1.0.1, 1.1.0, 1.1.1, 1.2.0 and 1.2.1, on clusters protected by Ranger and SqlStdHiveAuthorization, allows attackers to bypass intended parent table access restrictions via unspecified partition-level operations.

Resolution

Update pom.xml to use Hive 1.2.2.

Examples

Assume a table user_data:

action timestamp user_id gender
signup_page 100 1 f
confirm_button 200 1 f
submit_button 300 1 f
signup_page 200 2 m
submit_button 400 2 m
signup_page 100 3 f
confirm_button 200 3 f
decline 200 3 f
... ... ... ...

Simple funnel

SELECT funnel_merge(funnel)
FROM (SELECT funnel(action, timestamp, array('signup_page', 'email_signup'),
                                       array('confirm_button'),
                                       array('submit_button')) AS funnel
      FROM user_data
      GROUP BY user_id) t1;

Result: [3, 2, 1]

Simple funnel with conversion rate

SELECT funnel_conversion(funnel_merge(funnel))
FROM (SELECT funnel(action, timestamp, array('signup_page'),
                                       array('confirm_button'),
                                       array('submit_button')) AS funnel
      FROM user_data
      GROUP BY user_id) t1;

Result: [1.0, 0.66, 0.5]

Funnel with multiple groups

SELECT gender, funnel_merge(funnel)
FROM (SELECT gender,
             funnel(action, timestamp, array('signup_page'),
                                       array('confirm_button'),
                                       array('submit_button')) AS funnel
      FROM table
      GROUP BY user_id, gender) t1
GROUP BY gender;

Result: m: [1, 0, 0], f: [2, 2, 1]

Multiple parallel funnels

SELECT funnel_merge(funnel1), funnel_merge(funnel2)
FROM (SELECT funnel(action, timestamp, array('signup_page'),
                                       array('confirm_button'),
                                       array('submit_button')) AS funnel1
             funnel(action, timestamp, array('signup_page'),
                                       array('decline')) AS funnel2
      FROM table
      GROUP BY user_id) t1;

Result: [3, 2, 1] [3, 1]

Contributors

Josh Walters, [email protected]

License

Apache License, Version 2.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].