hrbrmstr / Orangetext
ππ : An #rstats project to keep track of The π One's speeches
Stars: β 53
Programming Languages
r
7636 projects
Labels
Projects that are alternatives of or similar to Orangetext
Dsci 100
Repository for UBC's Introduction to Data Science course (DSCI 100)
Stars: β 46 (-13.21%)
Mutual labels: rstats
Advanced R
One day course covering functions, functional programming and tidy evaluation
Stars: β 38 (-28.3%)
Mutual labels: rstats
Ndjson
β¨οΈ Wicked-Fast Streaming 'JSON' ('ndjson') Reader in R
Stars: β 44 (-16.98%)
Mutual labels: rstats
Inferregex
Infer the regular expression (regex) of a string π€ π’ π
Stars: β 41 (-22.64%)
Mutual labels: rstats
Resources
R-Ladies Resources : Various resources for R-Ladies Global and to be shared across chapters π π
Stars: β 47 (-11.32%)
Mutual labels: rstats
Hexagon
βοΈβΉβΆοΈ R package for creating hexagon shaped xy data frames.
Stars: β 40 (-24.53%)
Mutual labels: rstats
Dtupdate
The dtupdate package has functions that try to make it easier to keep up with the non-CRAN universe
Stars: β 51 (-3.77%)
Mutual labels: rstats
Iptools
π΄ A toolkit for manipulating, validating and testing IP addresses and ranges, along with datasets relating to IP addresses. While it primarily has support for the IPv4 address space, more extensive IPv6 support is intended.
Stars: β 44 (-16.98%)
Mutual labels: rstats
Euclid
Exact Computation Geometry Framework Based on 'CGAL'
Stars: β 52 (-1.89%)
Mutual labels: rstats
https://factba.se/
THIS REPO IS NO LONGER NECESSARY AND IS NOT BEING MAINTAINED GIVEN STAFFED RESOURCES SUCH ASorangetext
is an #rstats project to keep track of The π One's speeches and include some code snippets for text analysis on them.
Gladly accepting PRs for legit new transcripts and more analysis scripts.
Transcripts
2016-01-19-presidential-candidacy-anouncement-NewYorkCity-NY.txt
2016-08-31-immigration-Phoenix-AZ.txt
2016-10-13-addressing-sexual-assault-WestPalmBeach-FL.txt
2017-01-20-inaugural.txt
2017-01-21-cia.txt
2017-01-28-may.txt
2017-01-29-weekly-address.txt
2017-01-31-gorsuch.txt
2017-02-01-black-history-month.txt
2017-02-032-national-prayer.txt
2017-02-03-weekly-address.txt
-
2017-02-07-major_cities_chiefs_association_conference
#
Sample code
library(ngram)
library(tidyverse)
library(magrittr)
library(ggalt)
library(hrbrmisc)
library(stringi)
library(rprojroot)
Read all the speeches in:
rprojroot::find_rstudio_root_file() %>%
file.path("data", "speeches") %>%
list.files("*.txt", full.names=TRUE) %>%
map(read_lines) %>%
flatten_chr() %>%
stri_enc_toascii() %>%
stri_trim_both() %>%
discard(equals, "") %>%
paste0(collapse=" ") %>%
stri_replace_all_regex("[[:space:]]+", " ") %>%
preprocess(case="lower", remove.punct=TRUE,
remove.numbers=TRUE, fix.spacing=TRUE) -> texts
What have we got:
string.summary(texts)
## Chars: 127786
## Letters: 103672
## Whitespace: 23463
## Punctuation: 0
## Digits: 0
## Words: 23464
## Sentences: 0
## Lines: 1
## Wordlens: 728 869 898 1004 1784 1861 2879 3794 4634 5013
## 1 1 1 1 1 1 1 1 1 1
## Senlens: 0
## 10
## Syllens: 0 8 19 192 829 2174 5859 14331
## 3 1 1 1 1 1 1 1
The 1-grams are kinda useless but this makes a big tibble for 1:8-grams.
map_df(1:8, ~ngram(texts, n=.x) %>%
get.phrasetable() %>%
tbl_df() %>%
rename(words=ngrams) %>%
mutate(words=stri_trim_both(words)) %>%
mutate(ngram=sprintf("ngrams: %s", .x))) %>%
mutate(ngram=factor(ngram, levels=unique(ngram))) %>%
select(ngram, freq, prop, words) -> grams
glimpse(grams)
## Observations: 154,149
## Variables: 4
## $ ngram <fctr> ngrams: 1, ngrams: 1, ngrams: 1, ngrams: 1, ngrams: 1, ...
## $ freq <int> 984, 903, 654, 492, 458, 420, 383, 355, 311, 299, 291, 2...
## $ prop <dbl> 0.041936584, 0.038484487, 0.027872486, 0.020968292, 0.01...
## $ words <chr> "the", "and", "to", "of", "a", "i", "we", "that", "our",...
filter(grams, ngram=="ngrams: 3")
## # A tibble: 20,791 Γ 4
## ngram freq prop words
## <fctr> <int> <dbl> <chr>
## 1 ngrams: 3 30 0.0012786634 the united states
## 2 ngrams: 3 27 0.0011507970 going to be
## 3 ngrams: 3 24 0.0010229307 one of the
## 4 ngrams: 3 21 0.0008950644 were going to
## 5 ngrams: 3 20 0.0008524422 we have to
## 6 ngrams: 3 18 0.0007671980 by the way
## 7 ngrams: 3 16 0.0006819538 not going to
## 8 ngrams: 3 15 0.0006393317 and by the
## 9 ngrams: 3 15 0.0006393317 the american people
## 10 ngrams: 3 15 0.0006393317 of our country
## # ... with 20,781 more rows
filter(grams, ngram=="ngrams: 4")
## # A tibble: 22,630 Γ 4
## ngram freq prop words
## <fctr> <int> <dbl> <chr>
## 1 ngrams: 4 12 0.0005114871 and by the way
## 2 ngrams: 4 10 0.0004262393 of the united states
## 3 ngrams: 4 9 0.0003836154 the new york times
## 4 ngrams: 4 9 0.0003836154 we are going to
## 5 ngrams: 4 9 0.0003836154 all over the place
## 6 ngrams: 4 9 0.0003836154 thank you thank you
## 7 ngrams: 4 8 0.0003409914 we will make america
## 8 ngrams: 4 8 0.0003409914 we have people that
## 9 ngrams: 4 7 0.0002983675 make america great again
## 10 ngrams: 4 6 0.0002557436 is going to be
## # ... with 22,620 more rows
filter(grams, ngram=="ngrams: 5")
## # A tibble: 23,181 Γ 4
## ngram freq prop words
## <fctr> <int> <dbl> <chr>
## 1 ngrams: 5 5 0.0002131287 all you have to do
## 2 ngrams: 5 5 0.0002131287 the new york times and
## 3 ngrams: 5 4 0.0001705030 will make america great again
## 4 ngrams: 5 4 0.0001705030 we will vote for the
## 5 ngrams: 5 4 0.0001705030 that i can tell you
## 6 ngrams: 5 4 0.0001705030 we will bring back our
## 7 ngrams: 5 4 0.0001705030 the united states supreme court
## 8 ngrams: 5 4 0.0001705030 movement the likes of which
## 9 ngrams: 5 4 0.0001705030 we will make america great
## 10 ngrams: 5 4 0.0001705030 we have people that are
## # ... with 23,171 more rows
filter(grams, ngram=="ngrams: 6")
## # A tibble: 23,350 Γ 4
## ngram freq prop words
## <fctr> <int> <dbl> <chr>
## 1 ngrams: 6 4 0.0001705103 all you have to do is
## 2 ngrams: 6 4 0.0001705103 we will make america great again
## 3 ngrams: 6 3 0.0001278827 make america great again thank you
## 4 ngrams: 6 3 0.0001278827 you have to do is look
## 5 ngrams: 6 3 0.0001278827 were going to bring our jobs
## 6 ngrams: 6 3 0.0001278827 bless you and god bless america
## 7 ngrams: 6 3 0.0001278827 going to bring our jobs back
## 8 ngrams: 6 3 0.0001278827 god bless you and god bless
## 9 ngrams: 6 3 0.0001278827 to bring our jobs back home
## 10 ngrams: 6 3 0.0001278827 have to do is look at
## # ... with 23,340 more rows
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].