All Projects → EmilHvitfeldt → Textdata

EmilHvitfeldt / Textdata

Licence: other
Download, parse, store, and load text datasets instead of storing it in packages

Programming Languages

r
7636 projects

Labels

Projects that are alternatives of or similar to Textdata

Iptools
🍴 A toolkit for manipulating, validating and testing IP addresses and ranges, along with datasets relating to IP addresses. While it primarily has support for the IPv4 address space, more extensive IPv6 support is intended.
Stars: ✭ 44 (-25.42%)
Mutual labels:  rstats
Tl
tldr for R!
Stars: ✭ 52 (-11.86%)
Mutual labels:  rstats
Nodbi
Document DBI connector for R
Stars: ✭ 56 (-5.08%)
Mutual labels:  rstats
Resources
R-Ladies Resources : Various resources for R-Ladies Global and to be shared across chapters 💜 🌍
Stars: ✭ 47 (-20.34%)
Mutual labels:  rstats
Dtupdate
The dtupdate package has functions that try to make it easier to keep up with the non-CRAN universe
Stars: ✭ 51 (-13.56%)
Mutual labels:  rstats
Orangetext
🍊📄 : An #rstats project to keep track of The 🍊 One's speeches
Stars: ✭ 53 (-10.17%)
Mutual labels:  rstats
Ndjson
♨️ Wicked-Fast Streaming 'JSON' ('ndjson') Reader in R
Stars: ✭ 44 (-25.42%)
Mutual labels:  rstats
Mixomics
Development repository for the Bioconductor package 'mixOmics '
Stars: ✭ 58 (-1.69%)
Mutual labels:  rstats
Euclid
Exact Computation Geometry Framework Based on 'CGAL'
Stars: ✭ 52 (-11.86%)
Mutual labels:  rstats
Rtimes
R wrapper for NYTimes API for government data - ABANDONED
Stars: ✭ 55 (-6.78%)
Mutual labels:  rstats
Getlandsat
get landsat 8 images and metadata
Stars: ✭ 47 (-20.34%)
Mutual labels:  rstats
Rdoc
colourised R docs in the terminal
Stars: ✭ 49 (-16.95%)
Mutual labels:  rstats
Vcr
Record HTTP calls and replay them
Stars: ✭ 54 (-8.47%)
Mutual labels:  rstats
Dsci 100
Repository for UBC's Introduction to Data Science course (DSCI 100)
Stars: ✭ 46 (-22.03%)
Mutual labels:  rstats
Drake Examples
Example workflows for the drake R package
Stars: ✭ 57 (-3.39%)
Mutual labels:  rstats
Liger
Lightweight Iterative Gene set Enrichment in R
Stars: ✭ 44 (-25.42%)
Mutual labels:  rstats
Ggeconodist
📉 Create Diminutive Distribution Charts
Stars: ✭ 53 (-10.17%)
Mutual labels:  rstats
Sigmajs
Σ sigma.js for R
Stars: ✭ 58 (-1.69%)
Mutual labels:  rstats
Lawn
⛔ ARCHIVED ⛔ turf.js R client
Stars: ✭ 57 (-3.39%)
Mutual labels:  rstats
Colormap
R package to generate colors from a list of 44 pre-defined palettes
Stars: ✭ 55 (-6.78%)
Mutual labels:  rstats

textdata

R build status CRAN status Downloads DOI Codecov test coverage Lifecycle: stable

The goal of textdata is to provide access to text-related data sets for easy access without bundling them inside a package. Some text datasets are too large to store within an R package or are licensed in such a way that prevents them from being included in an OSS-licensed package. Instead, this package provides a framework to download, parse, and store the datasets on the disk and load them when needed.

Installation

You can install the not yet released version of textdata from CRAN with:

install.packages("textdata")

And the development version from GitHub with:

# install.packages("remotes")
remotes::install_github("EmilHvitfeldt/textdata")

Example

The first time you use one of the functions for accessing an included text dataset, such as lexicon_afinn() or dataset_sentence_polarity(), the function will prompt you to agree that you understand the dataset’s license or terms of use and then download the dataset to your computer.

After the first use, each time you use a function like lexicon_afinn(), the function will load the dataset from disk.

Included text datasets

As of today, the datasets included in textdata are:

Dataset Function
v1.0 sentence polarity dataset dataset_sentence_polarity()
AFINN-111 sentiment lexicon lexicon_afinn()
Hu and Liu’s opinion lexicon lexicon_bing()
NRC word-emotion association lexicon lexicon_nrc()
NRC Emotion Intensity Lexicon lexicon_nrc_eil()
The NRC Valence, Arousal, and Dominance Lexicon lexicon_nrc_vad()
Loughran and McDonald’s opinion lexicon for financial documents lexicon_loughran()
AG’s News dataset_ag_news()
DBpedia ontology dataset_dbpedia()
Trec-6 and Trec-50 dataset_trec()
IMDb Large Movie Review Dataset dataset_imdb()
Stanford NLP GloVe pre-trained word vectors embedding_glove6b()
embedding_glove27b()
embedding_glove42b()
embedding_glove840b()

Check out each function’s documentation for detailed information (including citations) for the relevant dataset.

Community Guidelines

Note that this project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms. Feedback, bug reports (and fixes!), and feature requests are welcome; file issues or seek support here. For details on how to add a new dataset to this package, check out the vignette!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].