All Projects → yaooqinn → Spark Authorizer

yaooqinn / Spark Authorizer

Licence: apache-2.0
A Spark SQL extension which provides SQL Standard Authorization for Apache Spark

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Spark Authorizer

Quicksql
A Flexible, Fast, Federated(3F) SQL Analysis Middleware for Multiple Data Sources
Stars: ✭ 1,821 (+1191.49%)
Mutual labels:  spark, hive
Apache Spark Hands On
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (-47.52%)
Mutual labels:  spark, hive
Bigdataguide
大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+479.43%)
Mutual labels:  spark, hive
God Of Bigdata
专注大数据学习面试,大数据成神之路开启。Flink/Spark/Hadoop/Hbase/Hive...
Stars: ✭ 6,008 (+4160.99%)
Mutual labels:  spark, hive
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-34.75%)
Mutual labels:  spark, hive
Bdp Dataplatform
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
Stars: ✭ 456 (+223.4%)
Mutual labels:  spark, hive
Luigi Warehouse
A luigi powered analytics / warehouse stack
Stars: ✭ 72 (-48.94%)
Mutual labels:  spark, hive
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+157.45%)
Mutual labels:  spark, hive
Hops Examples
Examples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-40.43%)
Mutual labels:  spark, hive
Hadoop cookbook
Cookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-41.84%)
Mutual labels:  spark, hive
Yanagishima
Web UI for Trino, Presto, Hive, Elasticsearch, SparkSQL
Stars: ✭ 424 (+200.71%)
Mutual labels:  spark, hive
Cube.js
📊 Cube — Open-Source Analytics API for Building Data Apps
Stars: ✭ 11,983 (+8398.58%)
Mutual labels:  spark, hive
Moonbox
Moonbox is a DVtaaS (Data Virtualization as a Service) Platform
Stars: ✭ 424 (+200.71%)
Mutual labels:  spark, hive
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+393.62%)
Mutual labels:  spark, hive
Wedatasphere
WeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+163.83%)
Mutual labels:  spark, hive
Szt Bigdata
深圳地铁大数据客流分析系统🚇🚄🌟
Stars: ✭ 826 (+485.82%)
Mutual labels:  spark, hive
BigData-News
基于Spark2.2新闻网大数据实时系统项目
Stars: ✭ 36 (-74.47%)
Mutual labels:  spark, hive
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+1643.97%)
Mutual labels:  spark, hive
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+747.52%)
Mutual labels:  spark, hive
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+7695.04%)
Mutual labels:  spark, hive

Spark Authorizer Build Status HitCount

Spark Authorizer provides you with SQL Standard Based Authorization for Apache Spark™ as same as SQL Standard Based Hive Authorization. While you are using Spark SQL or Dataset/DataFrame API to load data from tables embedded with Apache Hive™ metastore, this library provides row/column level fine-grained access controls by Apache Ranger™ or Hive SQL Standard Based Authorization.

Security is one of fundamental features for enterprise adoption. Apache Ranger™ offers many security plugins for many Hadoop ecosystem components, such as HDFS, Hive, HBase, Solr and Sqoop2. However, Apache Spark™ is not counted in yet. When a secured HDFS cluster is used as a data warehouse accessed by various users and groups via different applications wrote by Spark and Hive, it is very difficult to guarantee data management in a consistent way. Apache Spark users visit data warehouse only with Storage based access controls offered by HDFS. This library shares Ranger Hive plugin with Hive to help Spark talking to Ranger Admin.

Please refer to ACL Management for Spark SQL to see what spark-authorizer supports.

Quick Start

Step 1. Install Spark Authorizer

Include this package in your Spark Applications using:

spark-shell, pyspark, or spark-submit

> $SPARK_HOME/bin/spark-shell --packages yaooqinn:spark-authorizer:2.1.1

sbt

If you use the sbt-spark-package plugin, in your sbt build file, add:

spDependencies += "yaooqinn/spark-authorizer:2.1.1"

Otherwise,

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"

libraryDependencies += "yaooqinn" % "spark-authorizer" % "2.1.1"

Maven

In your pom.xml, add:

<dependencies>
  <!-- list of dependencies -->
  <dependency>
    <groupId>yaooqinn</groupId>
    <artifactId>spark-authorizer</artifactId>
    <version>2.1.1</version>
  </dependency>
</dependencies>
<repositories>
  <!-- list of other repositories -->
  <repository>
    <id>SparkPackagesRepo</id>
    <url>http://dl.bintray.com/spark-packages/maven</url>
  </repository>
</repositories>

Manully

If you Building Spark Authorizer manully, you can deploy via:

cp target/spark-authorizer-<version>.jar $SPARK_HOME/jars

Step 2. Install & Configure Ranger Hive Plugin

Please refer to Install Ranger Hive Plugin For Apache Spark to learn how to deploy the plugin jars to Apache Spark and set Ranger/Hive configurations.

Step 3. Enable Spark Authorizer

In $SPARK_HOME/conf/spark-defaults.conf, add:

spark.sql.extensions=org.apache.ranger.authorization.spark.authorizer.RangerSparkSQLExtension

NOTE spark.sql.extensions is only supported by Spark 2.2.x and later, for Spark 2.1.x please use Version: 1.1.3.spark2.1 and check the previous doc.

Interactive Spark Shell

The easiest way to start using Spark is through the Scala shell:

bin/spark-shell --master yarn --proxy-user hzyaoqin

Suffer for the Authorization Pain

We create a ranger policy as below: ranger-policy-details

Check Privilege with some simple cases.

Show databases

scala> spark.sql("show databases").show
+--------------+
|  databaseName|
+--------------+
|       default|
| spark_test_db|
| tpcds_10g_ext|
+--------------+

Switch database

scala> spark.sql("use spark_test_db").show
17/12/08 17:06:17 ERROR optimizer.Authorizer:
+===============================+
|Spark SQL Authorization Failure|
|-------------------------------|
|Permission denied: user [hzyaoqin] does not have [USE] privilege on [spark_test_db]
|-------------------------------|
|Spark SQL Authorization Failure|
+===============================+

Oops...

scala> spark.sql("use tpcds_10g_ext").show
++
||
++
++

LOL...

Select

scala> spark.sql("select cp_type from catalog_page limit 1").show
17/12/08 17:09:58 ERROR optimizer.Authorizer:
+===============================+
|Spark SQL Authorization Failure|
|-------------------------------|
|Permission denied: user [hzyaoqin] does not have [SELECT] privilege on [tpcds_10g_ext/catalog_page/cp_type]
|-------------------------------|
|Spark SQL Authorization Failure|
+===============================+

Oops...

scala> spark.sql("select * from call_center limit 1").show
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+--------+--------+------------+--------+--------+-----------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-------+-----------------+--------+------+-------------+-------------+-----------------+
|cc_call_center_sk|cc_call_center_id|cc_rec_start_date|cc_rec_end_date|cc_closed_date_sk|cc_open_date_sk| cc_name|cc_class|cc_employees|cc_sq_ft|cc_hours| cc_manager|cc_mkt_id|        cc_mkt_class|         cc_mkt_desc|cc_market_manager|cc_division|cc_division_name|cc_company|cc_company_name|cc_street_number|cc_street_name|cc_street_type|cc_suite_number|cc_city|        cc_county|cc_state|cc_zip|   cc_country|cc_gmt_offset|cc_tax_percentage|
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+--------+--------+------------+--------+--------+-----------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-------+-----------------+--------+------+-------------+-------------+-----------------+
|                1| AAAAAAAABAAAAAAA|       1998-01-01|           null|             null|        2450952|NY Metro|   large|           2|    1138| 8AM-4PM|Bob Belcher|        6|More than other a...|Shared others cou...|      Julius Tran|          3|             pri|         6|          cally|             730|      Ash Hill|     Boulevard|        Suite 0| Midway|Williamson County|      TN| 31904|United States|        -5.00|             0.11|
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+--------+--------+------------+--------+--------+-----------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-------+-----------------+--------+------+-------------+-------------+-----------------+

LOL...

Dataset/DataFrame

scala> spark.read.table("catalog_page").limit(1).collect
17/12/11 14:46:33 ERROR optimizer.Authorizer:
+===============================+
|Spark SQL Authorization Failure|
|-------------------------------|
|Permission denied: user [hzyaoqin] does not have [SELECT] privilege on [tpcds_10g_ext/catalog_page/cp_catalog_page_sk,cp_catalog_page_id,cp_promo_id,cp_start_date_sk,cp_end_date_sk,cp_department,cp_catalog_number,cp_catalog_page_number,cp_description,cp_type]
|-------------------------------|
|Spark SQL Authorization Failure|
+===============================+

Oops...

scala> spark.read.table("call_center").limit(1).collect
res3: Array[org.apache.spark.sql.Row] = Array([1,AAAAAAAABAAAAAAA,1998-01-01,null,null,2450952,NY Metro,large,2,1138,8AM-4PM,Bob Belcher,6,More than other authori,Shared others could not count fully dollars. New members ca,Julius Tran,3,pri,6,cally,730,Ash Hill,Boulevard,Suite 0,Midway,Williamson County,TN,31904,United States,-5.00,0.11])

LOL...


Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].