Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → hrbrmstr → Orangetext

hrbrmstr / Orangetext

🍊📄 : An #rstats project to keep track of The 🍊 One's speeches

Programming Languages

7636 projects

Labels

rstats

Projects that are alternatives of or similar to Orangetext

Chirp

🔬Visualise Twitter Interactions

Stars: ✭ 40 (-24.53%)

Mutual labels: rstats

Liger

Lightweight Iterative Gene set Enrichment in R

Stars: ✭ 44 (-16.98%)

Mutual labels: rstats

Rdoc

colourised R docs in the terminal

Stars: ✭ 49 (-7.55%)

Mutual labels: rstats

Globe4r

🌎 Interactive globe visualisations

Stars: ✭ 41 (-22.64%)

Mutual labels: rstats

Soccergraphr

Soccer Analytics in R using OPTA data

Stars: ✭ 42 (-20.75%)

Mutual labels: rstats

Dsci 100

Repository for UBC's Introduction to Data Science course (DSCI 100)

Stars: ✭ 46 (-13.21%)

Mutual labels: rstats

Advanced R

One day course covering functions, functional programming and tidy evaluation

Stars: ✭ 38 (-28.3%)

Mutual labels: rstats

tldr for R!

Stars: ✭ 52 (-1.89%)

Mutual labels: rstats

Ndjson

♨️ Wicked-Fast Streaming 'JSON' ('ndjson') Reader in R

Stars: ✭ 44 (-16.98%)

Mutual labels: rstats

Notary

🔏📦 Signing & verification of R packages

Stars: ✭ 48 (-9.43%)

Mutual labels: rstats

Inferregex

Infer the regular expression (regex) of a string 🔤 🔢 🔍

Stars: ✭ 41 (-22.64%)

Mutual labels: rstats

Rtweet Workshop

Slides and code for the rtweet workshop

Stars: ✭ 41 (-22.64%)

Mutual labels: rstats

Resources

R-Ladies Resources : Various resources for R-Ladies Global and to be shared across chapters 💜 🌍

Stars: ✭ 47 (-11.32%)

Mutual labels: rstats

Hexagon

◀️⏹▶️ R package for creating hexagon shaped xy data frames.

Stars: ✭ 40 (-24.53%)

Mutual labels: rstats

Dtupdate

The dtupdate package has functions that try to make it easier to keep up with the non-CRAN universe

Stars: ✭ 51 (-3.77%)

Mutual labels: rstats

Ggplot Courses

👨‍🏫 ggplot2 Teaching Material

Stars: ✭ 40 (-24.53%)

Mutual labels: rstats

Iptools

🍴 A toolkit for manipulating, validating and testing IP addresses and ranges, along with datasets relating to IP addresses. While it primarily has support for the IPv4 address space, more extensive IPv6 support is intended.

Stars: ✭ 44 (-16.98%)

Mutual labels: rstats

Ggeconodist

📉 Create Diminutive Distribution Charts

Stars: ✭ 53 (+0%)

Mutual labels: rstats

Euclid

Exact Computation Geometry Framework Based on 'CGAL'

Stars: ✭ 52 (-1.89%)

Mutual labels: rstats

Getlandsat

get landsat 8 images and metadata

Stars: ✭ 47 (-11.32%)

Mutual labels: rstats

View All Similar Projects ➔

THIS REPO IS NO LONGER NECESSARY AND IS NOT BEING MAINTAINED GIVEN STAFFED RESOURCES SUCH AS https://factba.se/

orangetext is an #rstats project to keep track of The 🍊 One's speeches and include some code snippets for text analysis on them.

Gladly accepting PRs for legit new transcripts and more analysis scripts.

Transcripts

2016-01-19-presidential-candidacy-anouncement-NewYorkCity-NY.txt
2016-08-31-immigration-Phoenix-AZ.txt
2016-10-13-addressing-sexual-assault-WestPalmBeach-FL.txt
2017-01-20-inaugural.txt
2017-01-21-cia.txt
2017-01-28-may.txt
2017-01-29-weekly-address.txt
2017-01-31-gorsuch.txt
2017-02-01-black-history-month.txt
2017-02-032-national-prayer.txt
2017-02-03-weekly-address.txt
2017-02-07-major_cities_chiefs_association_conference#

Sample code

library(ngram)
library(tidyverse)
library(magrittr)
library(ggalt)
library(hrbrmisc)
library(stringi)
library(rprojroot)

Read all the speeches in:

rprojroot::find_rstudio_root_file() %>%
  file.path("data", "speeches") %>%
  list.files("*.txt", full.names=TRUE) %>%
  map(read_lines) %>%
  flatten_chr() %>%
  stri_enc_toascii() %>%  
  stri_trim_both() %>%
  discard(equals, "") %>%
  paste0(collapse=" ") %>%
  stri_replace_all_regex("[[:space:]]+", " ") %>%
  preprocess(case="lower", remove.punct=TRUE,
             remove.numbers=TRUE, fix.spacing=TRUE) -> texts

What have we got:

string.summary(texts)

## Chars:       127786
## Letters:     103672
## Whitespace:  23463
## Punctuation: 0
## Digits:      0
## Words:       23464
## Sentences:   0
## Lines:       1 
## Wordlens:    728 869 898 1004 1784 1861 2879 3794 4634 5013 
##              1 1 1 1 1 1 1 1 1 1 
## Senlens:     0 
##              10 
## Syllens:     0 8 19 192 829 2174 5859 14331 
##              3 1 1 1 1 1 1 1

The 1-grams are kinda useless but this makes a big tibble for 1:8-grams.

map_df(1:8, ~ngram(texts, n=.x) %>%
         get.phrasetable() %>%
         tbl_df() %>%
         rename(words=ngrams) %>%
         mutate(words=stri_trim_both(words)) %>%
         mutate(ngram=sprintf("ngrams: %s", .x))) %>%
  mutate(ngram=factor(ngram, levels=unique(ngram))) %>% 
  select(ngram, freq, prop, words) -> grams

glimpse(grams)

## Observations: 154,149
## Variables: 4
## $ ngram <fctr> ngrams: 1, ngrams: 1, ngrams: 1, ngrams: 1, ngrams: 1, ...
## $ freq  <int> 984, 903, 654, 492, 458, 420, 383, 355, 311, 299, 291, 2...
## $ prop  <dbl> 0.041936584, 0.038484487, 0.027872486, 0.020968292, 0.01...
## $ words <chr> "the", "and", "to", "of", "a", "i", "we", "that", "our",...

filter(grams, ngram=="ngrams: 3")

## # A tibble: 20,791 × 4
##        ngram  freq         prop               words
##       <fctr> <int>        <dbl>               <chr>
## 1  ngrams: 3    30 0.0012786634   the united states
## 2  ngrams: 3    27 0.0011507970         going to be
## 3  ngrams: 3    24 0.0010229307          one of the
## 4  ngrams: 3    21 0.0008950644       were going to
## 5  ngrams: 3    20 0.0008524422          we have to
## 6  ngrams: 3    18 0.0007671980          by the way
## 7  ngrams: 3    16 0.0006819538        not going to
## 8  ngrams: 3    15 0.0006393317          and by the
## 9  ngrams: 3    15 0.0006393317 the american people
## 10 ngrams: 3    15 0.0006393317      of our country
## # ... with 20,781 more rows

filter(grams, ngram=="ngrams: 4")

## # A tibble: 22,630 × 4
##        ngram  freq         prop                    words
##       <fctr> <int>        <dbl>                    <chr>
## 1  ngrams: 4    12 0.0005114871           and by the way
## 2  ngrams: 4    10 0.0004262393     of the united states
## 3  ngrams: 4     9 0.0003836154       the new york times
## 4  ngrams: 4     9 0.0003836154          we are going to
## 5  ngrams: 4     9 0.0003836154       all over the place
## 6  ngrams: 4     9 0.0003836154      thank you thank you
## 7  ngrams: 4     8 0.0003409914     we will make america
## 8  ngrams: 4     8 0.0003409914      we have people that
## 9  ngrams: 4     7 0.0002983675 make america great again
## 10 ngrams: 4     6 0.0002557436           is going to be
## # ... with 22,620 more rows

filter(grams, ngram=="ngrams: 5")

## # A tibble: 23,181 × 4
##        ngram  freq         prop                           words
##       <fctr> <int>        <dbl>                           <chr>
## 1  ngrams: 5     5 0.0002131287              all you have to do
## 2  ngrams: 5     5 0.0002131287          the new york times and
## 3  ngrams: 5     4 0.0001705030   will make america great again
## 4  ngrams: 5     4 0.0001705030            we will vote for the
## 5  ngrams: 5     4 0.0001705030             that i can tell you
## 6  ngrams: 5     4 0.0001705030          we will bring back our
## 7  ngrams: 5     4 0.0001705030 the united states supreme court
## 8  ngrams: 5     4 0.0001705030     movement the likes of which
## 9  ngrams: 5     4 0.0001705030      we will make america great
## 10 ngrams: 5     4 0.0001705030         we have people that are
## # ... with 23,171 more rows

filter(grams, ngram=="ngrams: 6")

## # A tibble: 23,350 × 4
##        ngram  freq         prop                              words
##       <fctr> <int>        <dbl>                              <chr>
## 1  ngrams: 6     4 0.0001705103              all you have to do is
## 2  ngrams: 6     4 0.0001705103   we will make america great again
## 3  ngrams: 6     3 0.0001278827 make america great again thank you
## 4  ngrams: 6     3 0.0001278827             you have to do is look
## 5  ngrams: 6     3 0.0001278827       were going to bring our jobs
## 6  ngrams: 6     3 0.0001278827    bless you and god bless america
## 7  ngrams: 6     3 0.0001278827       going to bring our jobs back
## 8  ngrams: 6     3 0.0001278827        god bless you and god bless
## 9  ngrams: 6     3 0.0001278827        to bring our jobs back home
## 10 ngrams: 6     3 0.0001278827              have to do is look at
## # ... with 23,340 more rows

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 53

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗