















Reddit Comment Analysis
Disclaimer: I haven't done any data analysis in years, so this is a shy attempt to come back to it. I hope some of it is interesting and hopefully I haven't made many mistakes.
Note: A maximum of the latest 2,000 comments were fetched per user due to API limits.
Note 2: Added NSFW tag because there may be some subreddits/users that share that kind of content
Overall Statistics
- Total comments collected: 21,877,058
- Total comments analysed: 21,426,090
- Bot comments removed: 452,002
- Unique users: 29,574
- Unique subreddits: 92,100
- Moderator comments: 4,285,897
- Non-moderator comments: 17,140,193
- Average sentiment: -0.0180
- Median user comment karma: 3,093.5
- Proportion of comments by moderators: 20.00%
Medians are used for karma to avoid skew from bots or historic power users.
“Moderators” refers to users who moderate any subreddit, regardless of where the comment was made.
Fun Facts & Highlights
- Happiest user: u/wenalee (0.955 avg sentiment)
- Saddest user: u/ScienceOne1800 (-0.801 avg sentiment)
- Most upvoted user (avg): u/Determined-Man (59 avg karma)
- Most downvoted user (avg): u/TechnicianOrnery2265 (-21.00 avg karma)
- Most diverse commenter: u/Decent_Ad7583, with comments in 865 subreddits
- Busiest subreddit: r/AskReddit (242,512 comments)
- Most negative subreddit: r/World_Now (-0.605 median sentiment)
- Deepest-discussion subreddit (highest avg karma): r/greentext (64.35)
- Peak commenting time: Monday at 13:00 EST / 17:00 UTC
- Longest comment: 10,000 characters by u/basedfinger → view comment
- Most zero-karma comments: u/Basic_John_Doe_ (380 comments)
Visualisations
All charts shown include only users with ≥30 comments and subreddits with ≥500 comments.
- Comment count over weekday & hour (Last 5 Months) Displays clusters of comments by weekday and hour, revealing temporal patterns in community activity. Results displayed in both UTC and EST for easier interpretation.
- Mean sentiment over weekday & hour (Last 5 Months) Shows the distribution of comment sentiment by weekday and hour, revealing temporal patterns in community mood. Results displayed in both UTC and EST for easier interpretation.
- Top 20 subreddits by comment count Displays the subreddits with the largest total comment volume.
- Top 20 Subreddits by Median Comment Karma Highlights subreddits where comments tend to receive the highest median karma, suggesting positive or highly valued discussions.
- Top 20 Subreddits by Median Sentiment Ranks subreddits by the most positive median sentiment, identifying communities with the most upbeat or supportive conversations.
- Top 20 users by median comment karma Profiles users whose comments consistently receive the highest median karma, indicating valued contributors.
- Bottom 20 subreddits by mean commment karma Shows the subreddits where comments receive the lowest median karma, highlighting communities with the most downvoted or controversial discussions.
- Bottom 20 subreddits by median sentiment Shows subreddits where comments have the lowest sentiment, surfacing communities with the most negative or emotionally charged conversations.
- Bottom 20 users by median comment karma Describes users with the lowest median comment karma, often reflecting controversial or less appreciated contributions.
- Bottom 20 users by median sentiment Highlights users whose comments have the lowest average sentiment, surfacing the most negative or critical users.
- Median sentiment by account age bucket Highlights differences in comment sentiment across accounts of varying ages.
- User count by account age bucket Display the number of users within each account age bracket.
- User age vs sentiment (mods vs non-mods) Mean user sentiment by account age, with moderator status shown by colour.
Methodology
Data Collection & Filtering
- Across two weeks, usernames and comments were gathered from reddit. This was done really slow and non stop across 15 days to ensure a good representation for each of the hours and weekdays. Comments were deduplicated by
comment_id, and filtered to include only the last 5 years (or as many as available). - All timestamps are handled in UTC for consistency; local time conversions are only for visualization.
- Bot accounts are detected and excluded using a combination of repeated/similar comment detection and cached results.
Metrics & Aggregation
- Only users with ≥30 comments and subreddits with ≥500 comments are included in most aggregate charts to ensure statistical reliability.
- Medians are used for karma to reduce the influence of outliers and bots.
Sentiment Analysis
- Each comment is run through the cardiffnlp/twitter-roberta-base-sentiment-latest model to obtain negative, neutral and positive probabilities, which are combined into a single score normalised to the range [-1, 1].
- Subreddit-level and user-level sentiment are then reported as the median of those per-comment scores.
Bot Detection
- Users are flagged as bots if they post many repeated or highly similar comments.
- All bot-flagged users are excluded from analysis, metrics, and plots.
Posted by ehtio
17 comments
Why is this NSFW? Also u/wenalee seems to be a bot based on their messages
seems like u/wenalee is the most thankfull person in the world
great job btw
>Most zero-karma comments: u/Basic_John_Doe_ (380 comments)
Username checks out…
Just think of all the pay those MODs get on the bigger subs /s
BTW, this is all the info I’ve got from this sub
Stats for r/dataisbeautiful:
– Median karma: 1.00
– Mean sentiment: -0.276
– Unique users: 1045
– Total comments: 3,937
Plenty of unique users, but not many comments.
Fascinating! My favorite stat: the way sentiment drops with account age. Is this a reflection of “get off my lawn” energy, or is it just a Reddit thing?
How many from these users can define what woman is?
u/TechnicianOrnery2265 wouldn’t go to his close friends weddings because he wasn’t invited, well deserved
That’s almost interesting.
This is great stuff OP, thank you for sharing!
It makes me want to pull my own comment history and run a sentiment analysis to see if I need to perk up some lol
What you’ve done is awesome, how can I create such thing if I don’t have any data-related background?
This is some fun data. Most people use reddit while working, most people also have a noticeably worse mood while working. I guess that’s one of those things we mostly already knew, but it’s funny to see it laid out so plainly.
Interesting data, good post
english is not my first language, so can someone explain me abit clearer what sentiment is and how it is analysed?
Whelp I guess we’re all a bunch of assholes lol, if we gotta ask am I the asshole is the #2 sub lol yall being some asses
Don’t let any bosses see this. Busiest time for Reddit is 9-5 Monday-Friday. I still want to work from home.
Hey, this is really amazing!! Can you share your code base in details, perhaps with a GitHub repo?
This would help a lot!!
Comments are closed.