All Projects → ianmcook → Tidyquery

ianmcook / Tidyquery

Licence: apache-2.0
Query R data frames with SQL

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Tidyquery

Moderndive book
Statistical Inference via Data Science: A ModernDive into R and the Tidyverse
Stars: ✭ 527 (+281.88%)
Mutual labels:  tidyverse, dplyr
Fluentpdo
A PHP SQL query builder using PDO
Stars: ✭ 783 (+467.39%)
Mutual labels:  sql, query
Tidyexplain
🤹‍♀ Animations of tidyverse verbs using R, the tidyverse, and gganimate
Stars: ✭ 558 (+304.35%)
Mutual labels:  sql, dplyr
Walkable
A Clojure(script) SQL library for building APIs: Datomic® (GraphQL-ish) pull syntax, data driven configuration, dynamic filtering with relations in mind
Stars: ✭ 384 (+178.26%)
Mutual labels:  sql, query
Askxml
Run SQL statements on XML documents
Stars: ✭ 79 (-42.75%)
Mutual labels:  sql, query
Tidylog
Tidylog provides feedback about dplyr and tidyr operations. It provides wrapper functions for the most common functions, such as filter, mutate, select, and group_by, and provides detailed output for joins.
Stars: ✭ 428 (+210.14%)
Mutual labels:  tidyverse, dplyr
Tidyquant
Bringing financial analysis to the tidyverse
Stars: ✭ 635 (+360.14%)
Mutual labels:  tidyverse, dplyr
Loukoum
A simple SQL Query Builder
Stars: ✭ 305 (+121.01%)
Mutual labels:  sql, query
Purescript Selda
A type-safe, high-level SQL library for PureScript
Stars: ✭ 72 (-47.83%)
Mutual labels:  sql, query
Pypika
PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.
Stars: ✭ 1,111 (+705.07%)
Mutual labels:  sql, query
Timetk
A toolkit for working with time series in R
Stars: ✭ 371 (+168.84%)
Mutual labels:  tidyverse, dplyr
Graphquery
GraphQuery is a query language and execution engine tied to any backend service.
Stars: ✭ 112 (-18.84%)
Mutual labels:  sql, query
Baby squeel
🐷 An expressive query DSL for Active Record 4 and 5
Stars: ✭ 362 (+162.32%)
Mutual labels:  sql, query
Qone
Next-generation web query language, extend .NET LINQ for javascript.
Stars: ✭ 463 (+235.51%)
Mutual labels:  sql, query
Tidy
Tidy up your data with JavaScript, inspired by dplyr and the tidyverse
Stars: ✭ 307 (+122.46%)
Mutual labels:  tidyverse, dplyr
Siuba
Python library for using dplyr like syntax with pandas and SQL
Stars: ✭ 605 (+338.41%)
Mutual labels:  sql, dplyr
Roapi
Create full-fledged APIs for static datasets without writing a single line of code.
Stars: ✭ 253 (+83.33%)
Mutual labels:  sql, query
Preql
An interpreted relational query language that compiles to SQL.
Stars: ✭ 257 (+86.23%)
Mutual labels:  sql, query
Node Jl Sql Api
SQL for JS objects streams
Stars: ✭ 19 (-86.23%)
Mutual labels:  sql, query
Elasticsearch
Use SQL statements to query elasticsearch
Stars: ✭ 98 (-28.99%)
Mutual labels:  sql, query

tidyquery

CRAN status Travis build status AppVeyor build status Codecov test coverage

tidyquery runs SQL queries on R data frames.

It uses queryparser to translate SQL queries into R expressions, then it uses dplyr to evaluate these expressions and return results. tidyquery does not load data frames into a database; it queries them in place.

For an introduction to tidyquery and queryparser, watch the recording of the talk “Bridging the Gap between SQL and R” from rstudio::conf(2020).

Installation

Install the released version of tidyquery from CRAN with:

install.packages("tidyquery")

Or install the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("ianmcook/tidyquery")

Usage

tidyquery exports two functions: query() and show_dplyr().

Using query()

To run a SQL query on an R data frame, call the function query(), passing a SELECT statement enclosed in quotes as the first argument. The table names in the FROM clause should match the names of data frames in your current R session:

library(tidyquery)
library(nycflights13)

query(
" SELECT origin, dest,
    COUNT(flight) AS num_flts,
    round(SUM(seats)) AS num_seats,
    round(AVG(arr_delay)) AS avg_delay
  FROM flights f LEFT OUTER JOIN planes p
    ON f.tailnum = p.tailnum
  WHERE distance BETWEEN 200 AND 300
    AND air_time IS NOT NULL
  GROUP BY origin, dest
  HAVING num_flts > 3000
  ORDER BY num_seats DESC, avg_delay ASC
  LIMIT 2;"
)
#> # A tibble: 2 x 5
#>   origin dest  num_flts num_seats avg_delay
#>   <chr>  <chr>    <int>     <dbl>     <dbl>
#> 1 LGA    DCA       4468    712643         6
#> 2 EWR    BOS       5247    611192         5

Alternatively, for single-table queries, you can pass a data frame as the first argument and a SELECT statement as the second argument, omitting the FROM clause. This allows query() to function like a dplyr verb:

library(dplyr)

airports %>%
  query("SELECT name, lat, lon ORDER BY lat DESC LIMIT 5")
#> # A tibble: 5 x 3
#>   name                                         lat    lon
#>   <chr>                                      <dbl>  <dbl>
#> 1 Dillant Hopkins Airport                     72.3   42.9
#> 2 Wiley Post Will Rogers Mem                  71.3 -157. 
#> 3 Wainwright Airport                          70.6 -160. 
#> 4 Wainwright As                               70.6 -160. 
#> 5 Atqasuk Edward Burnell Sr Memorial Airport  70.5 -157.

You can chain dplyr verbs before and after query():

planes %>%
  filter(engine == "Turbo-fan") %>%
  query("SELECT manufacturer AS maker, COUNT(*) AS num_planes GROUP BY maker") %>%
  arrange(desc(num_planes)) %>%
  head(5)
#> # A tibble: 5 x 2
#>   maker            num_planes
#>   <chr>                 <int>
#> 1 BOEING                 1276
#> 2 BOMBARDIER INC          368
#> 3 AIRBUS                  331
#> 4 EMBRAER                 298
#> 5 AIRBUS INDUSTRIE        270

In the SELECT statement, the names of data frames and columns are case-sensitive (like in R) but keywords and function names are case-insensitive (like in SQL).

In addition to R data frames and tibbles (tbl_df objects), query() can be used to query other data frame-like objects, including:

  • dtplyr_step objects created with dtplyr, a data.table backend for dplyr
  • tbl_sql objects created with dbplyr or a dbplyr backend package, enabling you to write SQL which is translated to dplyr then translated back to SQL and run in a database 🤪

Using show_dplyr()

tidyquery works by generating dplyr code. To print the dplyr code instead of running it, use show_dplyr():

show_dplyr(
" SELECT manufacturer, 
    COUNT(*) AS num_planes
  FROM planes
  WHERE engine = 'Turbo-fan'
  GROUP BY manufacturer
  ORDER BY num_planes DESC;"
)
#> planes %>%
#>   filter(engine == "Turbo-fan") %>%
#>   group_by(manufacturer) %>%
#>   summarise(num_planes = dplyr::n()) %>%
#>   ungroup() %>%
#>   arrange(dplyr::desc(num_planes))

Current Limitations

tidyquery is subject to the current limitations of the queryparser package. Please see the Current Limitations section of the queryparser README on CRAN or GitHub.

tidyquery also has the following additional limitations:

  • Joins involving three or more tables are not supported.
  • Because joins in dplyr currently work in a fundamentally different way than joins in SQL, some other types of join queries are not supported. Examples of unsupported join queries include non-equijoin queries and outer join queries with qualified references to the join column(s). Planned changes in dplyr will enable future versions of tidyquery to support more types of joins.
  • In the code printed by show_dplyr(), calls to functions with more than five arguments might be truncated, with arguments after the fifth replaced with .... This is caused by a current limitation of the rlang package that is expected to be resolved in a future version.

Related Work

The sqldf package (CRAN, GitHub) runs SQL queries on R data frames by transparently setting up a database, loading data from R data frames into the database, running SQL queries in the database, and returning results as R data frames.

The duckdb package (CRAN, GitHub) includes the function duckdb_register() which registers an R data frame as a virtual table in a DuckDB database, enabling you to run SQL queries on the data frame with DBI::dbGetQuery().

The dbplyr package (CRAN, GitHub) is like tidyquery in reverse: it converts dplyr code into SQL, allowing you to use dplyr to work with data in a database.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].