All Projects → arsenvlad → docker-presto-adls-wasb

arsenvlad / docker-presto-adls-wasb

Licence: MIT License
Example of a single node Presto with Azure Data Lake Store (ADLS) and Azure Storage Blob (WASB) access via Hive metastore

Programming Languages

Dockerfile
14818 projects
shell
77523 projects

Projects that are alternatives of or similar to docker-presto-adls-wasb

Goofys
a high-performance, POSIX-ish Amazon S3 file system written in Go
Stars: ✭ 3,932 (+24475%)
Mutual labels:  azure-data-lake, azure-blob-storage
AzureStor
R interface to Azure storage accounts
Stars: ✭ 51 (+218.75%)
Mutual labels:  azure-data-lake
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+906.25%)
Mutual labels:  presto
TiBigData
TiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (+1100%)
Mutual labels:  presto
Quix
Quix Notebook Manager
Stars: ✭ 184 (+1050%)
Mutual labels:  presto
trino-teradata-connector
Presto-Teradata connector
Stars: ✭ 16 (+0%)
Mutual labels:  presto
Presto
The official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+80881.25%)
Mutual labels:  presto
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+15268.75%)
Mutual labels:  presto
databricks-notebooks
Collection of Databricks and Jupyter Notebooks
Stars: ✭ 19 (+18.75%)
Mutual labels:  azure-data-lake
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+668.75%)
Mutual labels:  presto
trino-query-formatter
Presto SQL query formatter
Stars: ✭ 16 (+0%)
Mutual labels:  presto
xyr
Query any data source using SQL, works with the local filesystem, s3, and more. It should be a very tiny and lightweight alternative to AWS Athena, Presto ... etc.
Stars: ✭ 58 (+262.5%)
Mutual labels:  presto
logica
Logica is a logic programming language that compiles to StandardSQL and runs on Google BigQuery.
Stars: ✭ 1,469 (+9081.25%)
Mutual labels:  presto
Presto Go Client
A Presto client for the Go programming language.
Stars: ✭ 183 (+1043.75%)
Mutual labels:  presto
hadoop-data-ingestion-tool
OLAP and ETL of Big Data
Stars: ✭ 17 (+6.25%)
Mutual labels:  presto
Linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+14418.75%)
Mutual labels:  presto
skbn
Copy files and directories between Kubernetes and cloud storage
Stars: ✭ 68 (+325%)
Mutual labels:  azure-blob-storage
presto-kubernetes
Running Presto on k8s
Stars: ✭ 38 (+137.5%)
Mutual labels:  presto
presto-client-php
A Presto client for the PHP programming language.
Stars: ✭ 24 (+50%)
Mutual labels:  presto
moodle-tool objectfs
Object file storage system for Moodle
Stars: ✭ 61 (+281.25%)
Mutual labels:  azure-blob-storage

Example of a single node Presto with Azure Data Lake Store (ADLS) and Azure Blob Storage (WASB)

Click to watch video Presto with ADLS and WASB

Start local Hive metastore and Presto containers

Clone this repo

git clone https://github.com/arsenvlad/docker-presto-adls-wasb

Run Hive and Presto containers using config specified in env.conf.private

docker-compose up --build

In a separate terminal window, list currently running containers

docker ps

Connect to Hive bash

In a separate terminal window, open interactive tty bash on the Hive container

docker exec -it dockerprestoadlswasb_hive_1 bash

In the Hive container bash session, open Hive CLI pointing to itself as an external metastore. If you get an error saying "Name node is in safe mode", wait for a few minutes and try again.

hive --hiveconf hive.metastore.uris=thrift://localhost:9083

Create table using Azure Storage Blobs (change the storage account name and container name to yours)

create table wasbtable1 (id int, name varchar(255)) row format delimited fields terminated by ',' stored as textfile location 'wasb://[email protected]/wasbtable1';

Create table using Azure Data Lake Store (change the ADLS account name to yours)

create table adltable1 (id int, name varchar(255)) row format delimited fields terminated by ',' stored as textfile location 'adl://avdatalake1.azuredatalakestore.net/adltable1';

Confirm you can see the tables show tables;

Connect to Presto bash

In a separate terminal window, open interactive tty bash on the Presto container

docker exec -it dockerprestoadlswasb_presto_1 bash

Presto is configured with a single node with Hive connector as described in /etc/motd

Use Presto CLI to connect to the running Presto server

/opt/presto/presto --server http://localhost:8080

List shemas in Hive catalog

show schemas from hive;

List tables in the Hive default catalog

show tables from hive.default;

Insert data into the tables

insert into hive.default.wasbtable1 (id, name) values (1,'1');
insert into hive.default.wasbtable1 (id, name) select id, name from hive.default.wasbtable1 union all select id, name from hive.default.wasbtable1 union all select id, name from hive.default.wasbtable1;

insert into hive.default.adltable1 (id, name) values (1,'1');
insert into hive.default.adltable1 (id, name) select id, name from hive.default.adltable1 union all select id, name from hive.default.adltable1 union all select id, name from hive.default.adltable1;

Select from the table

select * from hive.default.adltable1;

When using with HDInsight

NOTE: To access Azure HDInsight Hive Thrift Service your Docker host VM must be within the same network.

To find the URLs of the HDInsight Hive Thrift Service (i.e. hive.metastore.uri), SSH into the HDInsight cluster and run this grep command:

echo $(grep -n1 "hive.metastore.uri" /etc/hive/conf/hive-site.xml | grep -o "<value>.*/value>" | sed 's:<value>::g' | sed 's:</value>::g')

Presto with Azure Data Services

See azure-data-services.md for an example showing how to configure Presto connectors to Azure Data Services to query and join data from Azure CosmosDB (using MongoDB API), Azure SQL Database, Azure MySQL, Azure PostgreSQL and store the joined results in Azure Blob Storage.

Azure CosmosDB with MongoDB API, Azure SQL Database, Azure MySQL, Azure PostgreSQL

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].