Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → amandavisconti → ham4corpus

amandavisconti / ham4corpus

Licence: other

Data from "Hamilton: An American Musical", formatted for reuse. See below for some interesting text analysis basic findings! I am not throwing away my stopword?

Labels

digital-humanities humanities hamilton dh hamilton-lyrics

Projects that are alternatives of or similar to ham4corpus

infinite-ulysses-dissertation

InfiniteUlysses.com repo as it was when I finished the related Ph.D. project. See instead github.com/amandavisconti/infinite-ulysses-public for latest code, as this repo is frozen to represent my dissertation.

Stars: ✭ 21 (-60.38%)

Mutual labels: digital-humanities, humanities, dh

awesome-digital-history

Find primary sources online and learn how to research history digitally.

Stars: ✭ 110 (+107.55%)

Mutual labels: digital-humanities, humanities, dh

BookNLP, a natural language processing pipeline for books

Stars: ✭ 636 (+1100%)

Mutual labels: digital-humanities

The GitHub repository containing all the material related to the Computational Thinking and Programming course of the Digital Humanities and Digital Knowledge degree at the University of Bologna (a.a. 2018/2019).

Stars: ✭ 29 (-45.28%)

Mutual labels: digital-humanities

comp thinking social science

Computational Thinking for Social Scientists book project

Stars: ✭ 42 (-20.75%)

Mutual labels: digital-humanities

An open etymology dataset created using Wiktionary data. Contains 3.8M entries, 1.8M terms, 2900 languages, and 31 unique relationship types.

Stars: ✭ 20 (-62.26%)

Mutual labels: digital-humanities

从diy行为艺术到diy苏格拉底式对话，从diy一个仪式到diy一次旷课，各种活动指南的百科。diy💔是706孵化的一个非代码开源项目。

Stars: ✭ 49 (-7.55%)

Mutual labels: digital-humanities

Text collections made available by the CLiGS group.

Stars: ✭ 19 (-64.15%)

Mutual labels: digital-humanities

Scholarly Communications Workshops

Stars: ✭ 13 (-75.47%)

Mutual labels: digital-humanities

Distant Viewing Toolkit for the Analysis of Visual Culture

Stars: ✭ 57 (+7.55%)

Mutual labels: digital-humanities

tei-publisher-app

The main TEI Publisher app

Stars: ✭ 50 (-5.66%)

Mutual labels: digital-humanities

named-entity-recognition

Notebooks for teaching Named Entity Recognition at the Cultural Heritage Data School, run by Cambridge Digital Humanities

Stars: ✭ 18 (-66.04%)

Mutual labels: digital-humanities

Topic Words in Context (TWiC) is a highly-interactive, browser-based visualization for MALLET topic models

Stars: ✭ 51 (-3.77%)

Mutual labels: digital-humanities

awesome-dhtools

Software for humanities scholars using quantitative or computational methods.

Stars: ✭ 72 (+35.85%)

Mutual labels: digital-humanities

Intro-Cultural-Analytics

Introduction to Cultural Analytics & Python, course website and online textbook powered by Jupyter Book

Stars: ✭ 137 (+158.49%)

Mutual labels: digital-humanities

A participative platform for cultural texts translators

Stars: ✭ 19 (-64.15%)

Mutual labels: digital-humanities

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (+109.43%)

Mutual labels: digital-humanities

Does your favorite film pass the test?

Stars: ✭ 25 (-52.83%)

Mutual labels: digital-humanities

Edition Visualization Technology 2 - development

Stars: ✭ 66 (+24.53%)

Mutual labels: digital-humanities

A set of add-ons for the Omeka content management system, designed specifically for location-based narrative content, and compatible with (optional) paid Curatescape mobile applications.

Stars: ✭ 39 (-26.42%)

Mutual labels: digital-humanities

View All Similar Projects ➔

ham4corpus

The #ham4corpus repo currently contains files with various types of information about the Original Broadway Cast recording of Hamilton: An American Musical (i.e. all words currently in the show, minus the one scene not on the OBC recording).

Suggested uses: Twitter bots, text visualization.

All lyrics are by Lin-Manuel Miranda and copied from the LMM-annotated Genius.com hosting of the lyrics. Cast/character information is from Wikipedia. Links to all sources listed below.

For example:

Using Stéfan Sinclair and Geoffrey Rockwell's Voyant Tools to explore the text of the musical, I discovered the following about the Hamilton lyrics en masse (in order as one text block, with no names of speakers/singers interspersed):

21,351 total words and 2,939 unique word forms
Interesting frequent words: da (103 times, thanks George III), time (87), hamilton appears the same number of times wait does (79), room (71), burr (69), sir (56), satisfied (37), story (35), helpless (32).
Unsurprisingly, "sir" is the most frequent one-away collocate of "burr" (9 times).
Single word occurrence via microsearch (map of where a given word appears throughout the lyrics):

Red = the word "wait" vs the rest of the lyrics (the second-to-last red block is Burr's final "Wait!", last one is Eliza's "I can't wait to see you again")

Red = the word "Hamilton" vs the rest of the lyrics (was surprised that the word "Hamilton" isn't used again after Burr's last "The world was wide enough for both Hamilton and me")

Explore the Voyant dashboards for both versions of the lyrics yourself at:

tinyurl.com/hamilton-lyrics-names to explore the lyrics including the speaker names
tinyurl.com/just-hamilton-lyrics to explore the lyrics without the speaker names:

The files

Lyrics, including character names before their parts

Title: All_Hamilton_Lyrics_Speakers

What: One file containing all lyrics sung in the Original Broadway Cast recording of Hamilton, with the name of the character singing each part appearing on the line above the beginning of their part. No empty lines between anything, just a solid block of Hamilton. Copied and pasted from the lyrics on Genius.com; the placement of simultaneous lyrics is broken up rather than side-by-side.

Pulled from: http://genius.com/albums/Lin-manuel-miranda/Hamilton-original-broadway-cast-recording

Lyrics, not including character names before their parts

Title: All_Hamilton_Lyrics_No_Speakers

What: One file containing all lyrics sung in the Original Broadway Cast recording of Hamilton. No empty lines between anything, just a solid block of Hamilton. Copied and pasted from the lyrics on Genius.com; the placement of simultaneous lyrics is broken up rather than side-by-side.

Pulled from: http://genius.com/albums/Lin-manuel-miranda/Hamilton-original-broadway-cast-recording

Original Broadway Cast Actors & Character Names

Title: OBC_Cast_Actors_Character.json

What: Actors and the named characters played by them in the Original Broadway Cast recording of Hamilton. Actors who played multiple characters are listed multiple times.

Pulled from: https://en.wikipedia.org/wiki/Hamilton_(musical)#Principal_roles_and_major_casts

How

I wish I could say my Python is non-rusty enough that I scraped Genius.com and Wikipedia to get this data, but it isn't and I was on hold on the phone, so I just cut and pasted everything into a text document and grepped:

^\s*?\r to remove blank lines

[.* to remove the [character names]

If you'd like to learn more about doing cool digital things with text, check out The Programming Historian for novice-friendly, peer-reviewed lessons on data cleaning, distant reading, web scraping, and Python.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 53

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗