All Projects → Teradata → presto

Teradata / presto

Licence: Apache-2.0 license
Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data

Projects that are alternatives of or similar to presto

LogAnalyzeHelper
论坛日志分析系统清洗程序(包含IP规则库,UDF开发,MapReduce程序,日志数据)
Stars: ✭ 33 (-63.74%)
Mutual labels:  hadoop
skein
A tool and library for easily deploying applications on Apache YARN
Stars: ✭ 128 (+40.66%)
Mutual labels:  hadoop
sparkucx
A high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-64.84%)
Mutual labels:  hadoop
disk
基于hadoop+hbase+springboot实现分布式网盘系统
Stars: ✭ 53 (-41.76%)
Mutual labels:  hadoop
disq
A library for manipulating bioinformatics sequencing formats in Apache Spark
Stars: ✭ 29 (-68.13%)
Mutual labels:  hadoop
oci-cloudera
Terraform module to deploy Cloudera on Oracle Cloud Infrastructure (OCI)
Stars: ✭ 20 (-78.02%)
Mutual labels:  hadoop
qs-hadoop
大数据生态圈学习
Stars: ✭ 18 (-80.22%)
Mutual labels:  hadoop
liquibase-impala
Liquibase extension to add Impala Database support
Stars: ✭ 23 (-74.73%)
Mutual labels:  hadoop
xxhadoop
Data Analysis Using Hadoop/Spark/Storm/ElasticSearch/MachineLearning etc. This is My Daily Notes/Code/Demo. Don't fork, Just star !
Stars: ✭ 37 (-59.34%)
Mutual labels:  hadoop
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (-38.46%)
Mutual labels:  hadoop
pyspark-ML-in-Colab
Pyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-64.84%)
Mutual labels:  hadoop
corc
An ORC File Scheme for the Cascading data processing platform.
Stars: ✭ 14 (-84.62%)
Mutual labels:  hadoop
learning-spark
Tidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-69.23%)
Mutual labels:  hadoop
big-data-exploration
[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product
Stars: ✭ 43 (-52.75%)
Mutual labels:  hadoop
memex-gate
General Architecture for Text Engineering
Stars: ✭ 47 (-48.35%)
Mutual labels:  hadoop
datalake-etl-pipeline
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-57.14%)
Mutual labels:  hadoop
jmx exporter-cloudera-hadoop
Prometheus jmx_exporter configurations for Cloudera Hadoop
Stars: ✭ 33 (-63.74%)
Mutual labels:  hadoop
hadoop-ecosystem
Visualizations of the Hadoop Ecosystem
Stars: ✭ 20 (-78.02%)
Mutual labels:  hadoop
hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-81.32%)
Mutual labels:  hadoop
rastercube
rastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-83.52%)
Mutual labels:  hadoop

Presto Build Status

Presto is a distributed SQL query engine for big data.

See the User Manual for deployment instructions and end user documentation.

Requirements

  • Mac OS X or Linux
  • Java 8 Update 92 or higher (8u92+), 64-bit
  • Maven 3.3.9+ (for building)
  • Python 2.4+ (for running with the launcher script)

Building Presto

Presto is a standard Maven project. Simply run the following command from the project root directory:

./mvnw clean install

On the first build, Maven will download all the dependencies from the internet and cache them in the local repository (~/.m2/repository), which can take a considerable amount of time. Subsequent builds will be faster.

Presto has a comprehensive set of unit tests that can take several minutes to run. You can disable the tests when building:

./mvnw clean install -DskipTests

Running Presto in your IDE

Overview

After building Presto for the first time, you can load the project into your IDE and run the server. We recommend using IntelliJ IDEA. Because Presto is a standard Maven project, you can import it into your IDE using the root pom.xml file. In IntelliJ, choose Open Project from the Quick Start box or choose Open from the File menu and select the root pom.xml file.

After opening the project in IntelliJ, double check that the Java SDK is properly configured for the project:

  • Open the File menu and select Project Structure
  • In the SDKs section, ensure that a 1.8 JDK is selected (create one if none exist)
  • In the Project section, ensure the Project language level is set to 8.0 as Presto makes use of several Java 8 language features

Presto comes with sample configuration that should work out-of-the-box for development. Use the following options to create a run configuration:

  • Main Class: com.facebook.presto.server.PrestoServer
  • VM Options: -ea -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -Xmx2G -Dconfig=etc/config.properties -Dlog.levels-file=etc/log.properties
  • Working directory: $MODULE_DIR$
  • Use classpath of module: presto-main

The working directory should be the presto-main subdirectory. In IntelliJ, using $MODULE_DIR$ accomplishes this automatically.

Additionally, the Hive plugin must be configured with location of your Hive metastore Thrift service. Add the following to the list of VM options, replacing localhost:9083 with the correct host and port (or use the below value if you do not have a Hive metastore):

-Dhive.metastore.uri=thrift://localhost:9083

Using SOCKS for Hive or HDFS

If your Hive metastore or HDFS cluster is not directly accessible to your local machine, you can use SSH port forwarding to access it. Setup a dynamic SOCKS proxy with SSH listening on local port 1080:

ssh -v -N -D 1080 server

Then add the following to the list of VM options:

-Dhive.metastore.thrift.client.socks-proxy=localhost:1080

Running the CLI

Start the CLI to connect to the server and run SQL queries:

presto-cli/target/presto-cli-*-executable.jar

Run a query to see the nodes in the cluster:

SELECT * FROM system.runtime.nodes;

In the sample configuration, the Hive connector is mounted in the hive catalog, so you can run the following queries to show the tables in the Hive database default:

SHOW TABLES FROM hive.default;

Developers

We recommend you use IntelliJ as your IDE. The code style template for the project can be found in the codestyle repository along with our general programming and Java guidelines. In addition to those you should also adhere to the following:

  • Alphabetize sections in the documentation source files (both in table of contents files and other regular documentation files). In general, alphabetize methods/variables/sections if such ordering already exists in the surrounding code.
  • When appropriate, use the Java 8 stream API. However, note that the stream implementation does not perform well so avoid using it in inner loops or otherwise performance sensitive sections.
  • Categorize errors when throwing exceptions. For example, PrestoException takes an error code as an argument, PrestoException(HIVE_TOO_MANY_OPEN_PARTITIONS). This categorization lets you generate reports so you can monitor the frequency of various failures.
  • Ensure that all files have the appropriate license header; you can generate the license by running mvn license:format.
  • Consider using String formatting (printf style formatting using the Java Formatter class): format("Session property %s is invalid: %s", name, value) (note that format() should always be statically imported). Sometimes, if you only need to append something, consider using the + operator.
  • Avoid using the ternary operator except for trivial expressions.
  • Use an assertion from Airlift's Assertions class if there is one that covers your case rather than writing the assertion by hand. Over time we may move over to more fluent assertions like AssertJ.
  • When writing a Git commit message, follow these guidelines.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].