"Prepare your vernacular": Eminem’s Diversity of Lyrics Visualized Through Lexical Richness [OC]

“Prepare your vernacular”: Eminem’s Diversity of Lyrics Visualized Through Lexical Richness [OC]

30.07.2025

[OC] This chart plots the lexical diversity of Eminem’s lyrics, calculated as the ratio of unique words to total words, against the total word count of each song. Each point represents a track from his catalog (excluding skits), and the bubble size reflects Genius pageviews.

The shaded horizontal and vertical bands mark the middle 50% of values along each axis:

Lexical richness from 0.395 to 0.462
Word count from 696 to 952

Only a subset of songs are directly labeled on the chart. For the rest, the interactive version includes tooltips with full metadata, which has been fun to explore.

The four labeled quadrants were added to provide some structure, grouping songs by whether they tend to be longer, more repetitive, or more varied in vocabulary.

Lyrics were retrieved from Genius and tokenized in R. Plot was created in DataWrapper. 341 non-skit songs are shown; 23 skits were excluded from analysis.

Link to the interactive plot is here.

Posted by TreeFruitSpecialist

12 comments

TreeFruitSpecialist says:

30.07.2025 at 20:23

[OC]

Source: Lyrics were scraped from Genius using the `{geniusr}` package and cleaned/tokenized in R with `{tidytext}`. Lexical diversity was calculated as the ratio of unique tokens to total words.

Tool: Visualization was made in Datawrapper.
likwitsnake says:

30.07.2025 at 20:26

Would love to see this with Aesop Rock
shred-i-knight says:

30.07.2025 at 20:31

should make a version where you can compare different artists overlayed with eachother, would make it more useful to compare because it’s difficult to determine what the baseline is for an average rapper
Rattarang says:

30.07.2025 at 20:32

My first reaction is a method error. Rap god should(*) be very diverse, but it necessarily says “i, you” a lot.

Would a one-word song be perfectly diverse and longer songs necessarily repeat simple words?

Unique words per song (absolute)? Commenting for updates
rainbowWar says:

30.07.2025 at 20:33

Type token ratio as a measure of lexical diversity is sensitive to sample size. The general trend you see of a negative correlation is not a property of eminem’s music but a property of the bias in the lexical measure you are using. If you took the longer songs and took a subsample of them they would have higher lexical diversity.
mr_ji says:

30.07.2025 at 20:36

Someone did exactly this about 15 years ago with various artists. The conclusion was “WuTang Clan ain’t nothin’ to fuck wit’.” They were like 6 of the top 10 in breadth of vocab.
overfiend1976 says:

30.07.2025 at 20:42

You really want diversity in rap lyrics, Aesop Rock.
JesusSwag says:

30.07.2025 at 21:13

This only really makes sense when accounting for song length and tempo, otherwise faster and longer songs would naturally have more ‘diversity’, so maybe the real metric should be the number of unique words per X bars
CrazedProphet says:

30.07.2025 at 22:21

I love seeing visualizations like this but between artists and just seeing Aesop Rock sitting out buy himself. It’s actually how I first discovered him.
cybercuzco says:

30.07.2025 at 22:27

Lexical richness? I’m losing me perspicacity!
DynamicHunter says:

30.07.2025 at 23:10

Would love to see this grouped or colored by album. I swear MMLP2 has some insane triple and even quadruple entendres that fly over 95% of listeners heads.
intronert says:

30.07.2025 at 23:27

Really nice plot, and approach.

Now do Donna Summers’ “I Feel Love”. 🙂

Comments are closed.