All Projects β†’ hrbrmstr β†’ Orangetext

hrbrmstr / Orangetext

πŸŠπŸ“„ : An #rstats project to keep track of The 🍊 One's speeches

Programming Languages

r
7636 projects

Labels

Projects that are alternatives of or similar to Orangetext

Chirp
πŸ”¬Visualise Twitter Interactions
Stars: ✭ 40 (-24.53%)
Mutual labels:  rstats
Liger
Lightweight Iterative Gene set Enrichment in R
Stars: ✭ 44 (-16.98%)
Mutual labels:  rstats
Rdoc
colourised R docs in the terminal
Stars: ✭ 49 (-7.55%)
Mutual labels:  rstats
Globe4r
🌎 Interactive globe visualisations
Stars: ✭ 41 (-22.64%)
Mutual labels:  rstats
Soccergraphr
Soccer Analytics in R using OPTA data
Stars: ✭ 42 (-20.75%)
Mutual labels:  rstats
Dsci 100
Repository for UBC's Introduction to Data Science course (DSCI 100)
Stars: ✭ 46 (-13.21%)
Mutual labels:  rstats
Advanced R
One day course covering functions, functional programming and tidy evaluation
Stars: ✭ 38 (-28.3%)
Mutual labels:  rstats
Tl
tldr for R!
Stars: ✭ 52 (-1.89%)
Mutual labels:  rstats
Ndjson
♨️ Wicked-Fast Streaming 'JSON' ('ndjson') Reader in R
Stars: ✭ 44 (-16.98%)
Mutual labels:  rstats
Notary
πŸ”πŸ“¦ Signing & verification of R packages
Stars: ✭ 48 (-9.43%)
Mutual labels:  rstats
Inferregex
Infer the regular expression (regex) of a string πŸ”€ πŸ”’ πŸ”
Stars: ✭ 41 (-22.64%)
Mutual labels:  rstats
Rtweet Workshop
Slides and code for the rtweet workshop
Stars: ✭ 41 (-22.64%)
Mutual labels:  rstats
Resources
R-Ladies Resources : Various resources for R-Ladies Global and to be shared across chapters πŸ’œ 🌍
Stars: ✭ 47 (-11.32%)
Mutual labels:  rstats
Hexagon
◀️⏹▢️ R package for creating hexagon shaped xy data frames.
Stars: ✭ 40 (-24.53%)
Mutual labels:  rstats
Dtupdate
The dtupdate package has functions that try to make it easier to keep up with the non-CRAN universe
Stars: ✭ 51 (-3.77%)
Mutual labels:  rstats
Ggplot Courses
πŸ‘¨β€πŸ« ggplot2 Teaching Material
Stars: ✭ 40 (-24.53%)
Mutual labels:  rstats
Iptools
🍴 A toolkit for manipulating, validating and testing IP addresses and ranges, along with datasets relating to IP addresses. While it primarily has support for the IPv4 address space, more extensive IPv6 support is intended.
Stars: ✭ 44 (-16.98%)
Mutual labels:  rstats
Ggeconodist
πŸ“‰ Create Diminutive Distribution Charts
Stars: ✭ 53 (+0%)
Mutual labels:  rstats
Euclid
Exact Computation Geometry Framework Based on 'CGAL'
Stars: ✭ 52 (-1.89%)
Mutual labels:  rstats
Getlandsat
get landsat 8 images and metadata
Stars: ✭ 47 (-11.32%)
Mutual labels:  rstats

THIS REPO IS NO LONGER NECESSARY AND IS NOT BEING MAINTAINED GIVEN STAFFED RESOURCES SUCH AS https://factba.se/


orangetext is an #rstats project to keep track of The 🍊 One's speeches and include some code snippets for text analysis on them.

Gladly accepting PRs for legit new transcripts and more analysis scripts.

Transcripts

  • 2016-01-19-presidential-candidacy-anouncement-NewYorkCity-NY.txt
  • 2016-08-31-immigration-Phoenix-AZ.txt
  • 2016-10-13-addressing-sexual-assault-WestPalmBeach-FL.txt
  • 2017-01-20-inaugural.txt
  • 2017-01-21-cia.txt
  • 2017-01-28-may.txt
  • 2017-01-29-weekly-address.txt
  • 2017-01-31-gorsuch.txt
  • 2017-02-01-black-history-month.txt
  • 2017-02-032-national-prayer.txt
  • 2017-02-03-weekly-address.txt
  • 2017-02-07-major_cities_chiefs_association_conference#

Sample code

library(ngram)
library(tidyverse)
library(magrittr)
library(ggalt)
library(hrbrmisc)
library(stringi)
library(rprojroot)

Read all the speeches in:

rprojroot::find_rstudio_root_file() %>%
  file.path("data", "speeches") %>%
  list.files("*.txt", full.names=TRUE) %>%
  map(read_lines) %>%
  flatten_chr() %>%
  stri_enc_toascii() %>%  
  stri_trim_both() %>%
  discard(equals, "") %>%
  paste0(collapse=" ") %>%
  stri_replace_all_regex("[[:space:]]+", " ") %>%
  preprocess(case="lower", remove.punct=TRUE,
             remove.numbers=TRUE, fix.spacing=TRUE) -> texts

What have we got:

string.summary(texts)
## Chars:       127786
## Letters:     103672
## Whitespace:  23463
## Punctuation: 0
## Digits:      0
## Words:       23464
## Sentences:   0
## Lines:       1 
## Wordlens:    728 869 898 1004 1784 1861 2879 3794 4634 5013 
##              1 1 1 1 1 1 1 1 1 1 
## Senlens:     0 
##              10 
## Syllens:     0 8 19 192 829 2174 5859 14331 
##              3 1 1 1 1 1 1 1

The 1-grams are kinda useless but this makes a big tibble for 1:8-grams.

map_df(1:8, ~ngram(texts, n=.x) %>%
         get.phrasetable() %>%
         tbl_df() %>%
         rename(words=ngrams) %>%
         mutate(words=stri_trim_both(words)) %>%
         mutate(ngram=sprintf("ngrams: %s", .x))) %>%
  mutate(ngram=factor(ngram, levels=unique(ngram))) %>% 
  select(ngram, freq, prop, words) -> grams
glimpse(grams)
## Observations: 154,149
## Variables: 4
## $ ngram <fctr> ngrams: 1, ngrams: 1, ngrams: 1, ngrams: 1, ngrams: 1, ...
## $ freq  <int> 984, 903, 654, 492, 458, 420, 383, 355, 311, 299, 291, 2...
## $ prop  <dbl> 0.041936584, 0.038484487, 0.027872486, 0.020968292, 0.01...
## $ words <chr> "the", "and", "to", "of", "a", "i", "we", "that", "our",...
filter(grams, ngram=="ngrams: 3")
## # A tibble: 20,791 Γ— 4
##        ngram  freq         prop               words
##       <fctr> <int>        <dbl>               <chr>
## 1  ngrams: 3    30 0.0012786634   the united states
## 2  ngrams: 3    27 0.0011507970         going to be
## 3  ngrams: 3    24 0.0010229307          one of the
## 4  ngrams: 3    21 0.0008950644       were going to
## 5  ngrams: 3    20 0.0008524422          we have to
## 6  ngrams: 3    18 0.0007671980          by the way
## 7  ngrams: 3    16 0.0006819538        not going to
## 8  ngrams: 3    15 0.0006393317          and by the
## 9  ngrams: 3    15 0.0006393317 the american people
## 10 ngrams: 3    15 0.0006393317      of our country
## # ... with 20,781 more rows
filter(grams, ngram=="ngrams: 4")
## # A tibble: 22,630 Γ— 4
##        ngram  freq         prop                    words
##       <fctr> <int>        <dbl>                    <chr>
## 1  ngrams: 4    12 0.0005114871           and by the way
## 2  ngrams: 4    10 0.0004262393     of the united states
## 3  ngrams: 4     9 0.0003836154       the new york times
## 4  ngrams: 4     9 0.0003836154          we are going to
## 5  ngrams: 4     9 0.0003836154       all over the place
## 6  ngrams: 4     9 0.0003836154      thank you thank you
## 7  ngrams: 4     8 0.0003409914     we will make america
## 8  ngrams: 4     8 0.0003409914      we have people that
## 9  ngrams: 4     7 0.0002983675 make america great again
## 10 ngrams: 4     6 0.0002557436           is going to be
## # ... with 22,620 more rows
filter(grams, ngram=="ngrams: 5")
## # A tibble: 23,181 Γ— 4
##        ngram  freq         prop                           words
##       <fctr> <int>        <dbl>                           <chr>
## 1  ngrams: 5     5 0.0002131287              all you have to do
## 2  ngrams: 5     5 0.0002131287          the new york times and
## 3  ngrams: 5     4 0.0001705030   will make america great again
## 4  ngrams: 5     4 0.0001705030            we will vote for the
## 5  ngrams: 5     4 0.0001705030             that i can tell you
## 6  ngrams: 5     4 0.0001705030          we will bring back our
## 7  ngrams: 5     4 0.0001705030 the united states supreme court
## 8  ngrams: 5     4 0.0001705030     movement the likes of which
## 9  ngrams: 5     4 0.0001705030      we will make america great
## 10 ngrams: 5     4 0.0001705030         we have people that are
## # ... with 23,171 more rows
filter(grams, ngram=="ngrams: 6")
## # A tibble: 23,350 Γ— 4
##        ngram  freq         prop                              words
##       <fctr> <int>        <dbl>                              <chr>
## 1  ngrams: 6     4 0.0001705103              all you have to do is
## 2  ngrams: 6     4 0.0001705103   we will make america great again
## 3  ngrams: 6     3 0.0001278827 make america great again thank you
## 4  ngrams: 6     3 0.0001278827             you have to do is look
## 5  ngrams: 6     3 0.0001278827       were going to bring our jobs
## 6  ngrams: 6     3 0.0001278827    bless you and god bless america
## 7  ngrams: 6     3 0.0001278827       going to bring our jobs back
## 8  ngrams: 6     3 0.0001278827        god bless you and god bless
## 9  ngrams: 6     3 0.0001278827        to bring our jobs back home
## 10 ngrams: 6     3 0.0001278827              have to do is look at
## # ... with 23,340 more rows
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].