All Projects → quanteda → quanteda.corpora

quanteda / quanteda.corpora

Licence: other
A collection of corpora for quanteda

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to quanteda.corpora

workshop-IJTA
Rによる日本語テキスト分析入門
Stars: ✭ 25 (+47.06%)
Mutual labels:  text-analysis, quanteda
LSX
A word embeddings-based semi-supervised model for document scaling
Stars: ✭ 42 (+147.06%)
Mutual labels:  text-analysis, quanteda
wordhoard
This Python module can be used to obtain antonyms, synonyms, hypernyms, hyponyms, homophones and definitions.
Stars: ✭ 78 (+358.82%)
Mutual labels:  text-analysis
ChineseTextAnalysisResouce
中文文本分析相关资源汇总
Stars: ✭ 71 (+317.65%)
Mutual labels:  text-analysis
uima-uimafit
Apache UIMA uimaFIT
Stars: ✭ 31 (+82.35%)
Mutual labels:  text-analysis
text-analysis
Weaving analytical stories from text data
Stars: ✭ 12 (-29.41%)
Mutual labels:  text-analysis
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-5.88%)
Mutual labels:  text-analysis
Fake news detection
Fake News Detection in Python
Stars: ✭ 194 (+1041.18%)
Mutual labels:  text-analysis
nlpbuddy
A text analysis application for performing common NLP tasks through a web dashboard interface and an API
Stars: ✭ 115 (+576.47%)
Mutual labels:  text-analysis
ConTexto
Librería en Python para minería de texto y NLP
Stars: ✭ 43 (+152.94%)
Mutual labels:  text-analysis
knime-textprocessing
KNIME - Text Processing Extension (Labs)
Stars: ✭ 17 (+0%)
Mutual labels:  text-analysis
visualization
Text visualization tools
Stars: ✭ 18 (+5.88%)
Mutual labels:  text-analysis
IncredibleTextAdventure
No description or website provided.
Stars: ✭ 19 (+11.76%)
Mutual labels:  text-analysis
woolly
The Text Mining Elixir
Stars: ✭ 48 (+182.35%)
Mutual labels:  text-analysis
OleanderStemmingLibrary
Porter stemming library (C++)
Stars: ✭ 37 (+117.65%)
Mutual labels:  text-analysis
uima-uimaj
Apache UIMA Java SDK
Stars: ✭ 50 (+194.12%)
Mutual labels:  text-analysis
Shifterator
Interpretable data visualizations for understanding how texts differ at the word level
Stars: ✭ 209 (+1129.41%)
Mutual labels:  text-analysis
tutorials.quanteda.io
Quanteda tutorials website
Stars: ✭ 37 (+117.65%)
Mutual labels:  quanteda
TRUNAJOD2.0
An easy-to-use library to extract indices from texts.
Stars: ✭ 18 (+5.88%)
Mutual labels:  text-analysis
rectr
💒 Reproducible Extraction of Cross-lingual Topics using R
Stars: ✭ 19 (+11.76%)
Mutual labels:  text-analysis

CRAN status Travis build status

Corpora for quanteda

Package to provide easy access to large corpora for quanteda.

How to Install

You can download the files and build the package from source, or you can use the devtools library to install the package directly from GitHub. This is done as follows:

devtools::install_github("quanteda/quanteda.corpora")

Available corpora

Corpora contained in the package are the following:

Corpus Name
Amicus curiae briefs from Bakke (1978) and Bollinger (2008) data_corpus_amicus
Annual budget speeches from the Irish Dáil, 2008-2012 data_corpus_irishbudgets
UK news articles from 2014 that mention immigration data_corpus_immigrationnews
Movie reviews from Pang, Lee, and Vaithyanathan (2002) moved to quanteda.textmodels
US State of the Union addresses from 1790 to present data_corpus_sotu
UK political party manifestos, 1945-2005 data_corpus_ukmanifestos
UN General Debate speeches, 2017 data_corpus_ungd2017
Universal Declaration of Human Rights in 464 languages data_corpus_udhr

Larger corpora are also available from online locations using download():

Corpus Name
Guardian newspaper articles in politics, economy, society and international sections from 2012 to 2016 data_corpus_guardian
Transcripts of speeches at Japan's Committee on Foreign Affairs and Defense of the lower house (Shugiin) from 1947 to 2017 data_corpus_foreignaffairscommittee
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].