All Projects → lmullen → Gender

lmullen / Gender

Licence: other
Predict Gender from Names Using Historical Data

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Gender

Crrri
A Chrome Remote Interface written in R
Stars: ✭ 137 (-8.05%)
Mutual labels:  r-package, rstats
Biomartr
Genomic Data Retrieval with R
Stars: ✭ 144 (-3.36%)
Mutual labels:  r-package, rstats
Piggyback
📦 for using large(r) data files on GitHub
Stars: ✭ 122 (-18.12%)
Mutual labels:  r-package, rstats
Dataspice
🌶 Create lightweight schema.org descriptions of your datasets
Stars: ✭ 137 (-8.05%)
Mutual labels:  r-package, rstats
Geojsonio
Convert many data formats to & from GeoJSON & TopoJSON
Stars: ✭ 132 (-11.41%)
Mutual labels:  r-package, rstats
Shinyalert
🗯️ Easily create pretty popup messages (modals) in Shiny
Stars: ✭ 148 (-0.67%)
Mutual labels:  r-package, rstats
Jqr
R interface to jq
Stars: ✭ 123 (-17.45%)
Mutual labels:  r-package, rstats
Gramr
RStudio Addin, function, & shiny app for the write-good linter 📝
Stars: ✭ 116 (-22.15%)
Mutual labels:  r-package, rstats
Colourpicker
🎨 A colour picker tool for Shiny and for selecting colours in plots (in R)
Stars: ✭ 144 (-3.36%)
Mutual labels:  r-package, rstats
Packagemetrics
A Package for Helping You Choose Which Package to Use
Stars: ✭ 129 (-13.42%)
Mutual labels:  r-package, rstats
Modistsp
An "R" package for automatic download and preprocessing of MODIS Land Products Time Series
Stars: ✭ 118 (-20.81%)
Mutual labels:  r-package, rstats
Rnaturalearth
an R package to hold and facilitate interaction with natural earth map data 🌍
Stars: ✭ 140 (-6.04%)
Mutual labels:  r-package, rstats
Roomba
General purpose API response tidier
Stars: ✭ 117 (-21.48%)
Mutual labels:  r-package, rstats
Rcrossref
R client for various CrossRef APIs
Stars: ✭ 137 (-8.05%)
Mutual labels:  r-package, rstats
Available
Check if a package name is available to use
Stars: ✭ 116 (-22.15%)
Mutual labels:  r-package, rstats
Osmplotr
Data visualisation using OpenStreetMap objects
Stars: ✭ 122 (-18.12%)
Mutual labels:  r-package, rstats
Rgbif
Interface to the Global Biodiversity Information Facility API
Stars: ✭ 113 (-24.16%)
Mutual labels:  r-package, rstats
Umapr
UMAP dimensionality reduction in R
Stars: ✭ 115 (-22.82%)
Mutual labels:  r-package, rstats
Datapackager
An R package to enable reproducible data processing, packaging and sharing.
Stars: ✭ 125 (-16.11%)
Mutual labels:  r-package, rstats
Tic
Tasks Integrating Continuously: CI-Agnostic Workflow Definitions
Stars: ✭ 135 (-9.4%)
Mutual labels:  r-package, rstats

gender

CRAN_Status_Badge CRAN_Downloads Build Status AppVeyor Build Status Coverage Status

Guidelines and warnings

This package attempts to infer gender (or more precisely, sex assigned at birth) based on first names using historical data, typically data that was gathered by the state. This method has many limitations, and before you use this package be sure to take into account the following guidelines.

  1. Your analysis and the way you report it should take into account the limitations of this method, which include its reliance of data created by the state and its inability to see beyond the state-imposed gender binary. At a minimum, be sure to read our article explaining the limitations of this method, as well as the review article that is critical of this sort of methodology, both cited below.

  2. Do not use this package to study individuals: it is at most useful for studying populations in the aggregate.

  3. Resort to this method only when the alternative is not a more nuanced and justifiable approach to studying gender, but where the alternative is not studying gender at all. For instance, for many historical sources this approach might be the only way to get a sense of the sex ratios in a population. But ask whether you really need to use this method, whether you are using it responsibly, or whether you could use a better approach instead.

Blevins, Cameron, and Lincoln A. Mullen, “Jane, John … Leslie? A Historical Method for Algorithmic Gender Prediction,” Digital Humanities Quarterly 9, no. 3 (2015). http://www.digitalhumanities.org/dhq/vol/9/3/000223/000223.html

Mihaljević, Helena, Marco Tullney, Lucía Santamaría, and Christian Steinfeldt. “Reflections on Gender Analyses of Bibliographic Corpora.” Frontiers in Big Data 2 (August 28, 2019): 29. https://doi.org/10.3389/fdata.2019.00029.

Description

Data sets, historical or otherwise, often contain a list of first names but seldom identify those names by gender. Most techniques for finding gender programmatically rely on lists of male and female names. However, the gender associated with names can vary over time. Any data set that covers the normal span of a human life will require a historical method to find gender from names. This R package uses historical datasets from the U.S. Social Security Administration, the U.S. Census Bureau (via IPUMS USA), and the North Atlantic Population Project to provide predictions of gender for first names for particular countries and time periods.

Installation

You can install this package from CRAN:

install.packages("gender")

The first time you use the package you will be prompted to install the accompanying genderdata package. Alternatively, you can install this package for yourself from the rOpenSci package repository:

install.packages("genderdata", type = "source",
                 repos = "http://packages.ropensci.org")

If you prefer, you can install the development versions of both packages from the rOpenSci package repository:

install.packages(c("gender", "genderdata"),
                 repos = "http://packages.ropensci.org",
                 type = "source")

Using the package

The gender() function takes a character vector of names and a year or range of years and uses various datasets to predict the gender of names. Here we predict the gender of the names Madison and Hillary in 1930 and again in the 2000s using Social Security data.

library(gender)
gender(c("Madison", "Hillary"), years = 1930, method = "ssa")
#> # A tibble: 2 x 6
#>   name    proportion_male proportion_female gender year_min year_max
#>   <chr>             <dbl>             <dbl> <chr>     <dbl>    <dbl>
#> 1 Hillary               1                 0 male       1930     1930
#> 2 Madison               1                 0 male       1930     1930
gender(c("Madison", "Hillary"), years = c(2000, 2010), method = "ssa")
#> # A tibble: 2 x 6
#>   name    proportion_male proportion_female gender year_min year_max
#>   <chr>             <dbl>             <dbl> <chr>     <dbl>    <dbl>
#> 1 Hillary          0.0055             0.994 female     2000     2010
#> 2 Madison          0.0046             0.995 female     2000     2010

See the package vignette for a fuller introduction and suggestions on how to use the gender() function efficiently with large datasets.

vignette(topic = "predicting-gender", package = "gender")

To read the documentation for the datasets, install the genderdata package then examine the included datasets.

library(genderdata)
data(package = "genderdata")

Citation

If you use this package, I would appreciate a citation.

citation("gender")
#> 
#> To cite the 'gender' package, you may either cite the package
#> directly or cite the journal article which explains its method:
#> 
#>   Lincoln Mullen (2018). gender: Predict Gender from Names Using
#>   Historical Data. R package version 0.5.2.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {gender: Predict Gender from Names Using Historical Data},
#>     author = {Lincoln Mullen},
#>     year = {2018},
#>     note = {R package version 0.5.2},
#>     url = {https://github.com/ropensci/gender},
#>   }
#> 
#> For the journal article, please cite:
#> 
#> Cameron Blevins and Lincoln Mullen, "Jane, John ... Leslie? A
#> Historical Method for Algorithmic Gender Prediction," _Digital
#> Humanities Quarterly_ 9, no. 3 (2015):
#> <http://www.digitalhumanities.org/dhq/vol/9/3/000223/000223.html>.

rOpenSci logo

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].