All Projects → pivotal-legacy → Sql_magic

pivotal-legacy / Sql_magic

Licence: apache-2.0
Magic functions for using Jupyter Notebook with Apache Spark and a variety of SQL databases.

Projects that are alternatives of or similar to Sql magic

Recommendersystems pydata 2016
Stars: ✭ 166 (-0.6%)
Mutual labels:  jupyter-notebook
Dst
Deformable Style Transfer (ECCV 2020)
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook
Keras Seq 2 Seq Signal Prediction
An implementation of a sequence to sequence neural network using an encoder-decoder
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook
Miniradiotelescope
Arduino and python code for a mini-radio telescope.
Stars: ✭ 165 (-1.2%)
Mutual labels:  jupyter-notebook
Hacking Civico
Curso de Datos Abiertos y Hacking Cívico, habilitando las capacidades de la ciudadanía y los servidores públicos en el uso de datos abiertos.
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook
Aws Neuron Sdk
Powering Amazon custom machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook
Geospatial data with python
Introduction to Geospatial Data with Python
Stars: ✭ 166 (-0.6%)
Mutual labels:  jupyter-notebook
Relation Classification Using Bidirectional Lstm Tree
TensorFlow Implementation of the paper "End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures" and "Classifying Relations via Long Short Term Memory Networks along Shortest Dependency Paths" for classifying relations
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook
Lisrd
Local Invariance Selection at Runtime for Descriptors (LISRD)
Stars: ✭ 166 (-0.6%)
Mutual labels:  jupyter-notebook
Mine Mutual Information Neural Estimation
A pytorch implementation of MINE(Mutual Information Neural Estimation)
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook
Neuralimagesynthesis
Code to reproduce the results from the paper "Controlling Perceptual Factors in Neural Style Transfer" (https://arxiv.org/abs/1611.07865).
Stars: ✭ 165 (-1.2%)
Mutual labels:  jupyter-notebook
Tensorflow Tutorial
Practical tutorials and labs for TensorFlow used by Nvidia, FFN, CNN, RNN, Kaggle, AE
Stars: ✭ 1,954 (+1070.06%)
Mutual labels:  jupyter-notebook
Bookrepo
Stars: ✭ 166 (-0.6%)
Mutual labels:  jupyter-notebook
Hunter
A threat hunting / data analysis environment based on Python, Pandas, PySpark and Jupyter Notebook.
Stars: ✭ 166 (-0.6%)
Mutual labels:  jupyter-notebook
Python Crash Course
Python Crash Course
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook
Azure Cosmosdb Spark
Apache Spark Connector for Azure Cosmos DB
Stars: ✭ 165 (-1.2%)
Mutual labels:  jupyter-notebook
Wwcodedatascience
Content, Code & Resources for WWCodeDataScience Events https://www.womenwhocode.com/datascience
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook
Pytorch Retraining
Transfer Learning Shootout for PyTorch's model zoo (torchvision)
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook
Rnn For Human Activity Recognition Using 2d Pose Input
Activity Recognition from 2D pose using an LSTM RNN
Stars: ✭ 165 (-1.2%)
Mutual labels:  jupyter-notebook
Cnn Exposed
🕵️‍♂️ Interpreting Convolutional Neural Network (CNN) Results.
Stars: ✭ 167 (+0%)
Mutual labels:  jupyter-notebook

sql_magic

sql_magic is Jupyter magic for writing SQL to interact with Spark (or Hive) and relational databases. Query results are saved directly to a Pandas dataframe.

%%read_sql df_result
SELECT *
FROM table_name
WHERE age < {threshold}

The sql_magic library expands upon existing libraries such as ipython-sql with the following features:

  • Support for both Apache Spark and relational database connections simultaneously
  • Asynchronous execution (useful for long queries)
  • Browser notifications for query completion

See the included Jupyter notebook for examples and API usage.

Installation

pip install sql_magic

Usage: Execute SQL on a relational database

Relational databases can be accessed using SQLAlchemy or libraries implementing the Python DB 2.0 Specification (E.g., psycopg2, sqlite3, etc.).

# create SQLAlchemy engine for postgres
from sqlalchemy import create_engine
postgres_engine = create_engine('postgresql://{user}:{password}@{host}:5432/{database}'.format(**connect_credentials))

The sql_magic library is loaded using the %load_ext iPython extension syntax and is pointed to the connection object as follows:

%load_ext sql_magic
%config SQL.conn_name = 'postgres_engine'

Python variables can be directly referenced in the SQL query using the string formatting syntax:

# variables for use in SQL query
table_name = 'titanic'
cols = ','.join(['age','sex','fare'])

SQL code is executed with the %read_sql cell magic. A browser notification containing the execution time and result dimensions will automatically appear once the query is finished.

%%read_sql df_result
SELECT {cols}
FROM {table_name}
WHERE age < 10

SQL syntax is colored inside Jupyter:

A browser notification is displayed upon query completion.

Queries can be run again additional connection objects (Spark, Hive or relational db connections) with the -c or --connection flag:

#sql_magic supports libraries following Python DB 2.0 Specification
import psycopg2
conn2 = psycopg2.connect(**connect_credentials)
%%read_sql df_result -c conn2
SELECT {cols}
FROM {table_name}
WHERE age < 10

The code can be executed asynchronously using the -a flag. Asynchronous execution is particularly useful for running long queries in the background without blocking iPython kernel.

%%read_sql df_result -a

Since results are automatically saved as a Pandas dataframe, we can easily visualize our results using the built-in Pandas’ plotting routines:

df.plot('age', 'fare', kind='scatter')

Multi-line SQL statements are also supported:

%%read_sql
DROP TABLE IF EXISTS table123;
CREATE TABLE table123
AS
SELECT *
FROM table456;

Finally, line magic synatax is also available:

result = %read_sql SELECT * FROM table123;

Using sql_magic with Spark or Hive

The syntax for connecting with Spark is the same as above; simply point the connection object to a SparkSession, SQLContext, or HiveContext object:

# spark 2.0+
#uses SparkContext
%config SQL.conn_name = 'spark'

# spark 1.6 and before
from pyspark.sql import HiveContext  # or SQLContext
hive_context = HiveContext(sc)
%config SQL.conn_name = 'hive_context'

Configuration

Both browser notifications and displaying results to standard out are enabled by default. Either of these can be temporarily disabled with the -n and -d flags, respectively. They can also be disabled using the %config magic function.

Flags

Notifications and auto-display can be temporarily disabled with flags:

positional arguments:
  table_name

optional arguments:
  -h, --help     show this help message and exit
  -n, --notify   Toggle option for notifying query result
  -a, --async    Run query in seperate thread. Please be cautious when
                 assigning result to a variable
  -d, --display  Toggle option for outputing query result

Default values

Notifications and auto-display can be disabled by default using %config. If this is done for either option, the flags above will temporarily enable these features.

SQL options
-------------
SQL.conn_name=<Unicode>
    Current: u'conn'
    Object name for accessing computing resource environment
SQL.notify_result=<Bool>
    Current: True
    Notify query result to stdout
SQL.output_result=<Bool>
    Current: True
    Output query result to stdout
%config SQL.output_result = False  # disable browser notifications
%config SQL.notify_result = False  # disable output to std ou

That’s it! Give sql_magic a try and let us know what you think. Please submit a pull request for any improvements or bug fixes.

Acknowledgements

Thank you to Scott Hajek, Greg Tam, and Srivatsan Ramanujam, along with the rest of the Pivotal Data Science team for their help in developing this library. Thank you to Lia and Jackie Ho for help with the diagram. This library was inspired from and aided by the work of the ipython-sql library.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].