All Projects → Automattic → Php Thrift Sql

Automattic / Php Thrift Sql

Licence: gpl-2.0
A PHP library for connecting to Hive or Impala over Thrift

Projects that are alternatives of or similar to Php Thrift Sql

Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+4181.31%)
Mutual labels:  sql, hive, database
Kyuubi
Kyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+239.25%)
Mutual labels:  thrift, sql, hive
Hive
Apache Hive
Stars: ✭ 4,031 (+3667.29%)
Mutual labels:  sql, hive, database
Electrocrud
Database CRUD Application Built on Electron | MySQL, Postgres, SQLite
Stars: ✭ 1,267 (+1084.11%)
Mutual labels:  sql, database
Snowflake Jdbc
Snowflake JDBC Driver
Stars: ✭ 83 (-22.43%)
Mutual labels:  sql, database
Evolutility Server Node
Model-driven REST or GraphQL backend for CRUD and more, written in Javascript, using Node.js, Express, and PostgreSQL.
Stars: ✭ 84 (-21.5%)
Mutual labels:  sql, database
Nodejs Spanner
Node.js client for Google Cloud Spanner: the world’s first fully managed relational database service to offer both strong consistency and horizontal scalability.
Stars: ✭ 80 (-25.23%)
Mutual labels:  sql, database
Nymph
Data objects for JavaScript and PHP.
Stars: ✭ 97 (-9.35%)
Mutual labels:  sql, database
Qtl
A friendly and lightweight C++ database library for MySQL, PostgreSQL, SQLite and ODBC.
Stars: ✭ 92 (-14.02%)
Mutual labels:  sql, database
Griddb
GridDB is a next-generation open source database that makes time series IoT and big data fast,and easy.
Stars: ✭ 1,587 (+1383.18%)
Mutual labels:  sql, database
Defql
Create elixir functions with SQL as a body.
Stars: ✭ 100 (-6.54%)
Mutual labels:  sql, database
Cs Books
超过1000本的计算机经典书籍、个人笔记资料以及本人在各平台发表文章中所涉及的资源等。书籍资源包括C/C++、Java、Python、Go语言、数据结构与算法、操作系统、后端架构、计算机系统知识、数据库、计算机网络、设计模式、前端、汇编以及校招社招各种面经~
Stars: ✭ 1,215 (+1035.51%)
Mutual labels:  sql, database
Clickhouse Go
Golang driver for ClickHouse
Stars: ✭ 1,234 (+1053.27%)
Mutual labels:  sql, database
Graphjin
GraphJin - Build APIs in 5 minutes with GraphQL. An instant GraphQL to SQL compiler.
Stars: ✭ 1,264 (+1081.31%)
Mutual labels:  sql, database
Deveeldb
DeveelDB is a complete SQL database system, primarly developed for .NET/Mono frameworks
Stars: ✭ 80 (-25.23%)
Mutual labels:  sql, database
Toydb
Distributed SQL database in Rust, written as a learning project
Stars: ✭ 1,329 (+1142.06%)
Mutual labels:  sql, database
Node Mysql Utilities
Query builder for node-mysql with introspection, etc.
Stars: ✭ 98 (-8.41%)
Mutual labels:  sql, database
Maha
A framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-5.61%)
Mutual labels:  sql, hive
Monetdblite
MonetDB reconfigured as a library
Stars: ✭ 107 (+0%)
Mutual labels:  sql, database
Laravel Log To Db
Custom Laravel and Lumen 5.6+ Log channel handler that can store log events to SQL or MongoDB databases. Uses Laravel/Monolog native logging functionality.
Stars: ✭ 76 (-28.97%)
Mutual labels:  sql, database

PHP ThriftSQL

The ThriftSQL.phar archive aims to provide access to SQL-on-Hadoop frameworks for PHP. It bundles Thrift and various service packages together and exposes a common interface for running queries over the various frameworks.

Currently the following engines are supported:

  • Hive -- Over the HiveServer2 Thrift interface, SASL is enabled by default so username and password must be provided however this can be turned off with the setSasl() method before calling connect().
  • Impala -- Over the Impala Service Thrift interface which extends the Beeswax protocol.

Version Compatibility

This library is currently compiled against the Thrift definitions of the following database versions:

Using the compiler and base PHP classes of:

Usage Example

The recommended way to use this library is to get results from Hive/Impala via the memory efficient iterator which will keep the connection open and scroll through the results a couple rows at a time. This allows the processing of large result datasets one record at a time minimizing PHP's memory consumption.

// Load this lib
require_once __DIR__ . '/ThriftSQL.phar';

// Try out a Hive query via iterator object
$hive = new \ThriftSQL\Hive( 'hive.host.local', 10000, 'user', 'pass' );
$hiveTables = $hive
  ->connect()
  ->getIterator( 'SHOW TABLES' );

// Try out an Impala query via iterator object
$impala = new \ThriftSQL\Impala( 'impala.host.local' );
$impalaTables = $impala
  ->connect()
  ->setOption( 'MEM_LIMIT', '2gb' ) // optionally set some query options
  ->getIterator( 'SHOW TABLES' );

// Execute the Hive query and iterate over the result set
foreach( $hiveTables as $rowNum => $row ) {
  print_r( $row );
}

// Execute the Impala query and iterate over the result set
foreach( $impalaTables as $rowNum => $row ) {
  print_r( $row );
}

// Don't forget to close socket connection once you're done with it
$hive->disconnect();
$impala->disconnect();

The downside to using the memory efficient iterator is that we can only iterate over the result set once. If a second foreach is called on the same iterator object an exception is thrown by default to prevent the same query from executing on Hive/Impala again as results are not cached within the PHP client. This can be turned off however be aware iterating over the same iterator object may produce different results as the query is rerun.

Consider the following example:

// Connect to hive and get a rerun-able iterator
$hive = new \ThriftSQL\Hive( 'hive.host.local', 10000, 'user', 'pass' );
$results = $hive
  ->connect()
  ->getIterator( 'SELECT UNIX_TIMESTAMP()' )
  ->allowRerun( true );

// Execute the Hive query and get results
foreach( $results as $rowNum => $row ) {
  echo "Hive server time is: {$v[0]}\n";
}

sleep(3);

// Execute the Hive query a second time
foreach( $results as $rowNum => $row ) {
  echo "Hive server time is: {$v[0]}\n";
}

Which will output something like:

Hive server time is: 1517875200
Hive server time is: 1517875203

If the result set is small and it would be easier to load all of it into PHP memory the queryAndFetchAll() method can be used which will return a plain numeric multidimensional array of the full result set.

// Try out a small Hive query
$hive = new \ThriftSQL\Hive( 'hive.host.local', 10000, 'user', 'pass' );
$hiveTables = $hive
  ->connect()
  ->queryAndFetchAll( 'SHOW TABLES' );
$hive->disconnect();

// Print out the cached results
print_r( $hiveTables );
// Try out a small Impala query
$impala = new \ThriftSQL\Impala( 'impala.host.local' );
$impalaTables = $impala
  ->connect()
  ->queryAndFetchAll( 'SHOW TABLES' );
$impala->disconnect();

// Print out the cached results
print_r( $impalaTables );

Developing & Contributing

In order to rebuild this library you will need Composer to install dev dependencies and Apache Thrift to compile client libraries from the Thrift interface definition files.

Once dev tools are installed, make sure you get all git submodules:

$ git submodule init

And then the phar can be rebuilt using make:

$ make clean && make phar

NOTE: If you get a BadMethodCallException, it may come from any of the reasons mentioned in the PHP doc, or even a low soft limit on open file descriptors since Phar::compressfiles keeps all files opened until it writes the compressed phar.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].