We parsed the full Harry Potter book series (plus some character metadata and a little web crawling) to build a dynamic graph of character interactions. You can follow the story not just by chapters, but by relationships that grow and shift over time.

Explore the full interactive graph [here](https://truemichato.github.io/Harry-Potter-DS-Project/dynamic\_relationship\_graph\_1\_10\_sample.html)





Posted by michato

8 comments
  1. Hi everyone! OP here đź‘‹

    This is OC created by myself and two friends as part of a Data Science course project. We’re all big Potterheads, so we decided to explore the Harry Potter series through the lens of network science and NLP.

    Project Overview
    We built a time-evolving character interaction graph from the Harry Potter books — where edges represent character co-occurrence and sentiment in each chapter. The idea was to visualize how relationships shift over time, and who really drives the story (spoiler: the golden trio still wins).

    [Live interactive demo](https://truemichato.github.io/Harry-Potter-DS-Project/dynamic_relationship_graph_1_10_sample.html)

    [GitHub repo (code + data + writeup)](https://github.com/TrueMichato/Harry-Potter-DS-Project)

    How We Did It?

    Data Sources:

    Book text: Harry Potter series compiled into CSV format by Gaston Sanchez

    Character metadata: Kaggle dataset by Josè Roberto Canuto

    Additional info: Manual verification + crawling from the Harry Potter Lexicon

    Methods & Tools:

    Coreference resolution with SpaCy + Coreferee (to untangle “he,” “the boy who lived,” etc.)

    Regex alias resolution for name variants (Weasley chaos)

    Network construction: character co-occurrence in sentences → dynamic graphs

    Graph analysis: PageRank, Louvain & Leiden community detection

    Sentiment analysis: TextBlob + CardiffNLP’s Twitter RoBERTa-base

    Visualization: Plotly + matplotlib

    What We Found
    PageRank MVPs: Harry, Ron, and Hermione unsurprisingly dominate

    Communities: Louvain clustering grouped characters into accurate story arcs (Marauders, Weasleys, etc.)

    Sentiment trends: Relationships shift in tone across the books (we’ve got the data to prove it!)

    Model insight: TextBlob performed better than RoBERTa due to domain mismatch

    We had a blast doing this and learned a lot about NLP, network theory, and how messy natural language really is. If you want to dig into the full process or remix the code for another series — everything’s open source!

    Feel free to ask any questions about methods, data, or Hogwarts house drama.

  2. As a Harry Potter fan this is really cool. Id adjust the color scale though.

  3. This would be greatly improved by adding spring physics to help with the sudden transitions.

  4. Can’t read half of the names due to their small size and lack of contrast.

  5. Now do the same thing for A Song of Ice and Fire.

  6. I found this very interesting. Over time, I’ve seen the movies more often than I’ve reread the books, and it really shows how different the screen time changed as the story progresses — more so than the books by all appearances.

    As someone who teaches data science, this is such a nice outcome from your coursework. Something intriguing and not just the titanic dataset over and over again. Nice work!

  7. I don’t think you can call it beautiful when the graph has no axis labels and you can barely read anything by the end of it

  8. What do the X and Y axis represent? I have a small screen (and brain) and can’t figure it out.

Comments are closed.