Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → spoddutur → Cloud Based Sql Engine Using Spark

spoddutur / Cloud Based Sql Engine Using Spark

Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.

Programming Languages

java

68154 projects - #9 most used programming language

Labels

jdbc apache-spark

Projects that are alternatives of or similar to Cloud Based Sql Engine Using Spark

Flintrock

A command-line tool for launching Apache Spark clusters.

Stars: ✭ 568 (+1793.33%)

Mutual labels: apache-spark

Mycat2

MySQL Proxy using Java NIO based on Sharding SQL,Calcite ,simple and fast

Stars: ✭ 750 (+2400%)

Mutual labels: jdbc

Live log analyzer spark

Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.

Stars: ✭ 14 (-53.33%)

Mutual labels: apache-spark

Hibernate Springboot

Collection of best practices for Java persistence performance in Spring Boot applications

Stars: ✭ 589 (+1863.33%)

Mutual labels: jdbc

Hasor

Hasor是一套基于 Java 语言的开发框架，区别于其它框架的是 Hasor 有着自己一套完整的体系，同时还可以和先有技术体系做到完美融合。它包含：IoC/Aop容器框架、Web框架、Jdbc框架、RSF分布式RPC框架、DataQL引擎，等几块。

Stars: ✭ 713 (+2276.67%)

Mutual labels: jdbc

Goodreads etl pipeline

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Stars: ✭ 793 (+2543.33%)

Mutual labels: apache-spark

Openscoring

REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models

Stars: ✭ 536 (+1686.67%)

Mutual labels: apache-spark

Datahacksummit 2017

Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark

Stars: ✭ 30 (+0%)

Mutual labels: apache-spark

Kafka Storm Starter

Code examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.

Stars: ✭ 728 (+2326.67%)

Mutual labels: apache-spark

Mobius

C# and F# language binding and extensions to Apache Spark

Stars: ✭ 929 (+2996.67%)

Mutual labels: apache-spark

Dist Keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.

Stars: ✭ 613 (+1943.33%)

Mutual labels: apache-spark

Kafka Connect Jdbc

Kafka Connect connector for JDBC-compatible databases

Stars: ✭ 698 (+2226.67%)

Mutual labels: jdbc

Myjdbc Rainbow

jpa--轻量级orm模式对象与数据库映射api

Stars: ✭ 23 (-23.33%)

Mutual labels: jdbc

Jailer

Database Subsetting and Relational Data Browsing Tool.

Stars: ✭ 576 (+1820%)

Mutual labels: jdbc

Spark Streaming Monitoring With Lightning

Plot live-stats as graph from ApacheSpark application using Lightning-viz

Stars: ✭ 15 (-50%)

Mutual labels: apache-spark

Streaming Readings

Streaming System 相关的论文读物

Stars: ✭ 554 (+1746.67%)

Mutual labels: apache-spark

Sparklyr

R interface for Apache Spark

Stars: ✭ 775 (+2483.33%)

Mutual labels: apache-spark

Spark Flamegraph

Easy CPU Profiling for Apache Spark applications

Stars: ✭ 30 (+0%)

Mutual labels: apache-spark

Spark

Apache Spark - A unified analytics engine for large-scale data processing

Stars: ✭ 31,618 (+105293.33%)

Mutual labels: jdbc

Pgjdbc

Postgresql JDBC Driver

Stars: ✭ 925 (+2983.33%)

Mutual labels: jdbc

View All Similar Projects ➔

Spark As CLoudBased SQL Engine

This project shows how to use SPARK as Cloud-based SQL Engine and expose your big-data as a JDBC/ODBC data source via the Spark thrift server.

1. Central Idea

Traditional relational Database engines like SQL had scalability problems and so evolved couple of SQL-on-Hadoop frameworks like Hive, Cloudier Impala, Presto etc. These frameworks are essentially cloud-based solutions and they all come with their own advantages and limitations. This project will demo how SparkSQL comes across as one more SQL-on-Hadoop framework.

2. Architecture

Following picture illustrates how ApacheSpark can be used as SQL-on-Hadoop framework to serve your big-data as a JDBC/ODBC data source via the Spark thrift server.:

Data from multiple sources can be pushed into Spark and then exposed as SQLtable
These tables are then made accessible as a JDBC/ODBC data source via the Spark thrift server.
Multiple clients like Beeline CLI, JDBC, ODBC or BI tools like Tableau connect to Spark thrift server.
Once the connection is established, ThriftServer will contact SparkSQL engine to access Hive or Spark temp tables and run the sql queries on ApacheSpark framework.
Spark Thrift basically works similar to HiveServer2 thrift where HiveServer2 submits the sql queries as Hive MapReduce job vs Spark thrift server will use Spark SQL engine which underline uses full spark capabilities.

To know more about this topic, please refer to my blog here where I briefed the concept in detail.

3. Structure of the project:

data: Contains input json used in MainApp to register sample data with SparkSql.
src/main/java/MainApp.scala: Spark 2.1 implementation where it starts SparkSession and registers data from input.json with SparkSQL. (To keep the spark-session alive, there's a continuous while-loop in there).
src/test/java/TestThriftClient.java: Java class to demo how to connect to thrift server as JDBC source and query the registered data

4. How to run this project?

This project does demo 2 things:

4.1. How to register data with SparkSql
4.2. How to query registered data via Spark ThriftServer - using Beeline and JDBC

4.1 How to register data with SparkSql

Download this project.
Build it: mvn clean install and
Run MainApp: spark-submit --class MainApp cloud-based-sql-engine-using-spark.jar. Tht's it!
It'll register some sample data in records table with SparkSQL.

4.2 How to query registered data via Spark Thrift Server using Beeline and JDBC?

For this, first connect to Spark ThriftServer. Once the connection is established, just like HiveServer2, access Hive or Spark temp tables to run the sql queries on ApacheSpark framework. I'll show 2 ways to do this:

Beeline: Perhaps, the simplest is to use beeline command-line tool provided in Spark's bin folder.

`$> beeline`
Beeline version 2.1.1-amzn-0 by Apache Hive

// Connect to spark thrift server..
`beeline> !connect jdbc:hive2://localhost:10000`
Connecting to jdbc:hive2://localhost:10000
Enter username for jdbc:hive2://localhost:10000:
Enter password for jdbc:hive2://localhost:10000:

// run your sql queries and access data..
`jdbc:hive2://localhost:10000> show tables;,`

Java JDBC: Please refer to this project's test folder where I've shared a java example TestThriftClient.java to demo the same.

5. Requirements

Spark 2.1.0, Java 1.8 and Scala 2.11

6. References:

Complete guide and references to this project are briefed in my blog here.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 30

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗