All Projects → amandavisconti → ham4corpus

amandavisconti / ham4corpus

Licence: other
Data from "Hamilton: An American Musical", formatted for reuse. See below for some interesting text analysis basic findings! I am not throwing away my stopword?

Projects that are alternatives of or similar to ham4corpus

infinite-ulysses-dissertation
InfiniteUlysses.com repo as it was when I finished the related Ph.D. project. See instead github.com/amandavisconti/infinite-ulysses-public for latest code, as this repo is frozen to represent my dissertation.
Stars: ✭ 21 (-60.38%)
Mutual labels:  digital-humanities, humanities, dh
awesome-digital-history
Find primary sources online and learn how to research history digitally.
Stars: ✭ 110 (+107.55%)
Mutual labels:  digital-humanities, humanities, dh
booknlp
BookNLP, a natural language processing pipeline for books
Stars: ✭ 636 (+1100%)
Mutual labels:  digital-humanities
2018-2019
The GitHub repository containing all the material related to the Computational Thinking and Programming course of the Digital Humanities and Digital Knowledge degree at the University of Bologna (a.a. 2018/2019).
Stars: ✭ 29 (-45.28%)
Mutual labels:  digital-humanities
comp thinking social science
Computational Thinking for Social Scientists book project
Stars: ✭ 42 (-20.75%)
Mutual labels:  digital-humanities
etymology-db
An open etymology dataset created using Wiktionary data. Contains 3.8M entries, 1.8M terms, 2900 languages, and 31 unique relationship types.
Stars: ✭ 20 (-62.26%)
Mutual labels:  digital-humanities
wiki
从diy行为艺术到diy苏格拉底式对话,从diy一个仪式到diy一次旷课,各种活动指南的百科。diy💔是706孵化的一个非代码开源项目。
Stars: ✭ 49 (-7.55%)
Mutual labels:  digital-humanities
textbox
Text collections made available by the CLiGS group.
Stars: ✭ 19 (-64.15%)
Mutual labels:  digital-humanities
workshops
Scholarly Communications Workshops
Stars: ✭ 13 (-75.47%)
Mutual labels:  digital-humanities
dvt
Distant Viewing Toolkit for the Analysis of Visual Culture
Stars: ✭ 57 (+7.55%)
Mutual labels:  digital-humanities
tei-publisher-app
The main TEI Publisher app
Stars: ✭ 50 (-5.66%)
Mutual labels:  digital-humanities
named-entity-recognition
Notebooks for teaching Named Entity Recognition at the Cultural Heritage Data School, run by Cambridge Digital Humanities
Stars: ✭ 18 (-66.04%)
Mutual labels:  digital-humanities
twic
Topic Words in Context (TWiC) is a highly-interactive, browser-based visualization for MALLET topic models
Stars: ✭ 51 (-3.77%)
Mutual labels:  digital-humanities
awesome-dhtools
Software for humanities scholars using quantitative or computational methods.
Stars: ✭ 72 (+35.85%)
Mutual labels:  digital-humanities
Intro-Cultural-Analytics
Introduction to Cultural Analytics & Python, course website and online textbook powered by Jupyter Book
Stars: ✭ 137 (+158.49%)
Mutual labels:  digital-humanities
TraduXio
A participative platform for cultural texts translators
Stars: ✭ 19 (-64.15%)
Mutual labels:  digital-humanities
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+109.43%)
Mutual labels:  digital-humanities
bechdel-test
Does your favorite film pass the test?
Stars: ✭ 25 (-52.83%)
Mutual labels:  digital-humanities
evt-viewer
Edition Visualization Technology 2 - development
Stars: ✭ 66 (+24.53%)
Mutual labels:  digital-humanities
Curatescape
A set of add-ons for the Omeka content management system, designed specifically for location-based narrative content, and compatible with (optional) paid Curatescape mobile applications.
Stars: ✭ 39 (-26.42%)
Mutual labels:  digital-humanities

ham4corpus

The #ham4corpus repo currently contains files with various types of information about the Original Broadway Cast recording of Hamilton: An American Musical (i.e. all words currently in the show, minus the one scene not on the OBC recording).

Suggested uses: Twitter bots, text visualization.

All lyrics are by Lin-Manuel Miranda and copied from the LMM-annotated Genius.com hosting of the lyrics. Cast/character information is from Wikipedia. Links to all sources listed below.

For example:

Using Stéfan Sinclair and Geoffrey Rockwell's Voyant Tools to explore the text of the musical, I discovered the following about the Hamilton lyrics en masse (in order as one text block, with no names of speakers/singers interspersed):

  • 21,351 total words and 2,939 unique word forms
  • Interesting frequent words: da (103 times, thanks George III), time (87), hamilton appears the same number of times wait does (79), room (71), burr (69), sir (56), satisfied (37), story (35), helpless (32).
  • Unsurprisingly, "sir" is the most frequent one-away collocate of "burr" (9 times).
  • Single word occurrence via microsearch (map of where a given word appears throughout the lyrics):

Screenshot of Voyant microsearch for occurences of the word "wait" throughout the Hamilton lyrics Red = the word "wait" vs the rest of the lyrics (the second-to-last red block is Burr's final "Wait!", last one is Eliza's "I can't wait to see you again")

Screenshot of Voyant microsearch for occurences of the word "Hamilton" throughout the Hamilton lyrics Red = the word "Hamilton" vs the rest of the lyrics (was surprised that the word "Hamilton" isn't used again after Burr's last "The world was wide enough for both Hamilton and me")

Explore the Voyant dashboards for both versions of the lyrics yourself at:

The files

Lyrics, including character names before their parts

Title: All_Hamilton_Lyrics_Speakers

What: One file containing all lyrics sung in the Original Broadway Cast recording of Hamilton, with the name of the character singing each part appearing on the line above the beginning of their part. No empty lines between anything, just a solid block of Hamilton. Copied and pasted from the lyrics on Genius.com; the placement of simultaneous lyrics is broken up rather than side-by-side.

Pulled from: http://genius.com/albums/Lin-manuel-miranda/Hamilton-original-broadway-cast-recording

Lyrics, not including character names before their parts

Title: All_Hamilton_Lyrics_No_Speakers

What: One file containing all lyrics sung in the Original Broadway Cast recording of Hamilton. No empty lines between anything, just a solid block of Hamilton. Copied and pasted from the lyrics on Genius.com; the placement of simultaneous lyrics is broken up rather than side-by-side.

Pulled from: http://genius.com/albums/Lin-manuel-miranda/Hamilton-original-broadway-cast-recording

Original Broadway Cast Actors & Character Names

Title: OBC_Cast_Actors_Character.json

What: Actors and the named characters played by them in the Original Broadway Cast recording of Hamilton. Actors who played multiple characters are listed multiple times.

Pulled from: https://en.wikipedia.org/wiki/Hamilton_(musical)#Principal_roles_and_major_casts

How

I wish I could say my Python is non-rusty enough that I scraped Genius.com and Wikipedia to get this data, but it isn't and I was on hold on the phone, so I just cut and pasted everything into a text document and grepped:

^\s*?\r to remove blank lines

[.* to remove the [character names]

If you'd like to learn more about doing cool digital things with text, check out The Programming Historian for novice-friendly, peer-reviewed lessons on data cleaning, distant reading, web scraping, and Python.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].