All Projects → seancarmody → ngramr

seancarmody / ngramr

Licence: MIT license
R package to query the Google Ngram Viewer

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to ngramr

epanet2toolkit
An R package for calling the Epanet software for simulation of piping networks.
Stars: ✭ 13 (-71.74%)
Mutual labels:  cran
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-65.22%)
Mutual labels:  linguistics
eia
An R package wrapping the US Energy Information Administration open data API.
Stars: ✭ 38 (-17.39%)
Mutual labels:  cran
Onset
A language evolution simulator, using realistic phonetic changes.
Stars: ✭ 30 (-34.78%)
Mutual labels:  linguistics
healthyR
Hospital Data Analysis Workflow Tools
Stars: ✭ 21 (-54.35%)
Mutual labels:  cran
RDML
RDML data import for R
Stars: ✭ 20 (-56.52%)
Mutual labels:  cran
event-embedding-multitask
*SEM 2018: Learning Distributed Event Representations with a Multi-Task Approach
Stars: ✭ 22 (-52.17%)
Mutual labels:  linguistics
TSP
Traveling Salesperson Problem - R package
Stars: ✭ 54 (+17.39%)
Mutual labels:  cran
mcmcr
An R package to manipulate MCMC samples
Stars: ✭ 17 (-63.04%)
Mutual labels:  cran
dev
PHOIBLE data and development.
Stars: ✭ 90 (+95.65%)
Mutual labels:  linguistics
rdocumentation-2.0
📚 RDocumentation provides an easy way to search the documentation for every version of every R package on CRAN and Bioconductor.
Stars: ✭ 197 (+328.26%)
Mutual labels:  cran
RcppEigen
Rcpp integration for the Eigen templated linear algebra library
Stars: ✭ 89 (+93.48%)
Mutual labels:  cran
rcppcnpy
Rcpp bindings for NumPy files
Stars: ✭ 24 (-47.83%)
Mutual labels:  cran
packagefinder
Comfortable search for R packages on CRAN, either directly from the R console or with an R Studio add-in
Stars: ✭ 43 (-6.52%)
Mutual labels:  cran
vosonSML
R package for collecting social media data and creating networks for analysis.
Stars: ✭ 65 (+41.3%)
Mutual labels:  cran
lambda-notebook
Lambda Notebook: Formal Semantics in Jupyter
Stars: ✭ 16 (-65.22%)
Mutual labels:  linguistics
contextual
Contextual Bandits in R - simulation and evaluation of Multi-Armed Bandit Policies
Stars: ✭ 72 (+56.52%)
Mutual labels:  cran
thinkr
Some tools for cleaning up messy 'Excel' files to be suitable for R
Stars: ✭ 21 (-54.35%)
Mutual labels:  cran
rcpptoml
Rcpp Bindings to C++ parser for TOML files
Stars: ✭ 26 (-43.48%)
Mutual labels:  cran
linguistics problems
Natural language processing in examples and games
Stars: ✭ 23 (-50%)
Mutual labels:  linguistics

ngramr - R package to query the Google Ngram Viewer

CRAN status DOI Build Status

The Google Books Ngram Viewer allows you to enter a list of phrases and then displays a graph showing how often the phrases have occurred in a large corpus of books (e.g., "British English", "English Fiction", "French") over time. The current corpus produced in 2019 contains almost two trillion words for English alone.

The underlying data is hidden in Web page, embedded in some Javascript. This package extracts the data and provides it in the form of an R dataframe. Early versions of code was adapted from a handy Python script available from Culturomics, written by Jean-Baptiste Michel. The code has been comprehensively redeveloped since then.

Note that in September 2022 the format of the corpus codes changed (e.g. "eng_2019" became "en-GB-2019"). The old codes are available in the the corpuses dataset.

Installing

This package requires R version 4.0.0 or higher. If you are using an older version of R you will be prompted to upgrade when you try to install the package, so you may as well upgrade now!

The official release of ngramr is available on CRAN. To install from CRAN, use the following command:

install.packages('ngramr')

If you have any problems installing the package on macOS, try installing from source:

install.packages("ngramr", type="source")

If you have the devtools package installed, install the latest stable version this package directly from GitHub:

library(devtools)
install_github("seancarmody/ngramr")
library(ngramr)

and if you are feeling a little more adventurous, you can install the development version:

install_github("seancarmody/ngramr", "develop")

although it may not always work.

If the latest release has broken some of your old code, you can install an older version, for example:

install_github("seancarmody/ngramr", "v1.9.0")

Note though that many releases fix problems that arise when Google changes the format of the Ngram Viewer website so older versions generally no longer work. If you are behind a proxy, install_github may not work for you. Instead of fiddling around with the RCurl proxy settings, you can download the latest ZIP archive and use install_local instead.

Examples

Here is an example of how to use the ngram function:

library(ggplot2)
ng  <- ngram(c("hacker", "programmer"), year_start = 1950)
ggplot(ng, aes(x = Year, y = Frequency, colour = Phrase)) +
  geom_line()

The result is a ggplot2 line graph of the following form:

Ngram Chart

The same result can be achieved even more simply by using the ggram plotting wrapper that supports many options, as in this example:

Ngram chart, with options

ggram(c("monarchy", "democracy"), year_start = 1500, year_end = 2000, 
      corpus = "en-GB-2012", ignore_case = TRUE, 
      geom = "area", geom_options = list(position = "stack")) + 
      labs(y = NULL)

The colours used by Google Ngram are available through the google_theme option, as in this example posted by Ben Zimmer at Language Log:

Ngram chart, with Google theme

ng <- c("((The United States is + The United States has) / The United States)",
      "((The United States are + The United States have) / The United States)")
ggram(ng, year_start = 1800, google_theme = TRUE) +
  theme(legend.direction = "vertical")

Getting help

If you encounter a bug, please file an issue with a reproducible example on GitHub.

Further Reading

For more information, read this Stubborn Mule post and the Google Ngram syntax documentation. Language Log has a good post written just after the launch of the 2012 corpus.

If you would rather work with R and SQL on the raw Google Ngram datasets, see this post.

Twitter Follow

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].