I created an animation of hierarchical clustering of the US into friendship networks from 2 to 50 clusters. The clusters show areas which are more tightly linked in terms of friendships (high probability of friendship). The white regions in the animation are the two regions that were created by the most recent split.
The data are at the county level, so counties are never split across clusters.
Now THIS is interesting data. What a cool way to look at Facebook friend info.
Really interesting to look at what areas share friendships, and which ones don’t (or share less).
Speaking as a Minnesotan, it’s absolutely wild to me that us (and the Dakotas, apparently) are SO distinct that the very *first* geographical carve out is MN + the Dakotas vs. Everyone Else, instead of like, East vs. West or something.
It’s fascinating that many of the clusters are very much based on states, but some are not. New England being so well defined is exciting to me.
I’m a bit lost (not a data science expert).
Are friendship networks supposed to mean who people are friends with according to state? As in you go through the friends list and categorize by location? Or is it more so the posts and where they come from?
I guess what I’m asking is please explain like I’m 5.
What I think is super interesting, if you look at the northern border of North Carolina, there’s a little carve-out that appears to be Patrick and Henry Counties in Virginia. I’m FROM that carve-out and now live in the middle of NC, and it’s wild to imagine that, “born on the NC border in two counties that were hit hard in the 90s, went to college then moved south to find work just as Facebook was dragging us in (and our families)” was pronounced to show up here.
Then you go back and look at other similar little carve-outs on state borders: one in MO/AR, another in ND/NB. It makes me wonder about those, given what I know about my own.
I think the last frame looks like the first cut of a US map with more sensible state boundaries, based more on human geography.
There’s something to be said about just how badly people avoid being friends with Texans if you’re not already in Texas.
Would be really interesting to see this with county/state lines superimposed.
All of this and we STILL have two Dakotas
I’m honestly surprised that NJ is all in one region instead of being split into NY/Philly Metro areas.
My guess is that Long Island is too tightly knit and pulls the rest of the city + lower NY with it?
What are you using to define the borders? County boundaries?
So interesting how state lines become visible!
So what are we looking at here, are each of these slides a map of regions with the highest instances of friendship occurrences?
What does the K value signify? Example when K = 2, only the region around North& South Dakota & Minnesota is highlighted – does that mean that area was used as a starting area, or that its significantly different from the rest of the states / most unique or isolated from friendships back to the rest of the state areas?
I guess this proves that the UP does in fact belong to Wisconsin.
I am surprised that no part of CT got lumped in with NYC/Long Island
Really fascinating how many states are clearly visible, how many get combined, and how many get divided up. Great work!
Apparently, that’s how state lines should be drawn.
How granular is the location data?
The clusters look to be county level at the finest. Is that because the data is county level, or are the clusters naturally county level? Or am I wrong about this observation all together?
The reason I ask is because county level granularity isn’t uniform across the country. It’s much more fine grained in the east than the west.
Extremely cool data! Never thought about geographical hierarchical clustering like this before but it’s really cool
Looks like a new way to establish representational districts.
Love this. To be clear, what analyses did you run to find optimum k, and what was the result?
Edit: and which do you think gave most intuitivelyinterpretable results?
I’m glad to see the distinct split of the Pittsburgh vs Philly rivalry.
Honestly this looks like a more equitable state map than the current state lines. Small and large states are mostly minimized
Did anyone else catch that the first division in the East Coast is between North and South? The initial regional divisions are interesting too.
The disconnect in Illinois being north of 80 and south of 80 is very funny.
25 comments
Data: [https://dataforgood.facebook.com/dfg/tools/social-connectedness-index#accessdata](https://dataforgood.facebook.com/dfg/tools/social-connectedness-index#accessdata)
Tools: R, Packages: dplyr, ggplot2, sf, usmap, tools, ggfx, gifski, scales
I created an animation of hierarchical clustering of the US into friendship networks from 2 to 50 clusters. The clusters show areas which are more tightly linked in terms of friendships (high probability of friendship). The white regions in the animation are the two regions that were created by the most recent split.
Edit:
k=75 and k=100: [https://www.reddit.com/user/haydendking/comments/1j8v5jr/hierarchical_clustering_of_the_us_based_on/](https://www.reddit.com/user/haydendking/comments/1j8v5jr/hierarchical_clustering_of_the_us_based_on/)
State lines superimposed (suggested by u/sdb00913 and u/TrynnaFindaBalance):
[https://www.reddit.com/user/haydendking/comments/1j8v6ht/hierarchical_clustering_of_the_us_based_on/](https://www.reddit.com/user/haydendking/comments/1j8v6ht/hierarchical_clustering_of_the_us_based_on/)
The data are at the county level, so counties are never split across clusters.
Now THIS is interesting data. What a cool way to look at Facebook friend info.
Really interesting to look at what areas share friendships, and which ones don’t (or share less).
Speaking as a Minnesotan, it’s absolutely wild to me that us (and the Dakotas, apparently) are SO distinct that the very *first* geographical carve out is MN + the Dakotas vs. Everyone Else, instead of like, East vs. West or something.
It’s fascinating that many of the clusters are very much based on states, but some are not. New England being so well defined is exciting to me.
I’m a bit lost (not a data science expert).
Are friendship networks supposed to mean who people are friends with according to state? As in you go through the friends list and categorize by location? Or is it more so the posts and where they come from?
I guess what I’m asking is please explain like I’m 5.
What I think is super interesting, if you look at the northern border of North Carolina, there’s a little carve-out that appears to be Patrick and Henry Counties in Virginia. I’m FROM that carve-out and now live in the middle of NC, and it’s wild to imagine that, “born on the NC border in two counties that were hit hard in the 90s, went to college then moved south to find work just as Facebook was dragging us in (and our families)” was pronounced to show up here.
Then you go back and look at other similar little carve-outs on state borders: one in MO/AR, another in ND/NB. It makes me wonder about those, given what I know about my own.
I think the last frame looks like the first cut of a US map with more sensible state boundaries, based more on human geography.
There’s something to be said about just how badly people avoid being friends with Texans if you’re not already in Texas.
Would be really interesting to see this with county/state lines superimposed.
All of this and we STILL have two Dakotas
I’m honestly surprised that NJ is all in one region instead of being split into NY/Philly Metro areas.
My guess is that Long Island is too tightly knit and pulls the rest of the city + lower NY with it?
What are you using to define the borders? County boundaries?
So interesting how state lines become visible!
So what are we looking at here, are each of these slides a map of regions with the highest instances of friendship occurrences?
What does the K value signify? Example when K = 2, only the region around North& South Dakota & Minnesota is highlighted – does that mean that area was used as a starting area, or that its significantly different from the rest of the states / most unique or isolated from friendships back to the rest of the state areas?
I guess this proves that the UP does in fact belong to Wisconsin.
I am surprised that no part of CT got lumped in with NYC/Long Island
Really fascinating how many states are clearly visible, how many get combined, and how many get divided up. Great work!
Apparently, that’s how state lines should be drawn.
How granular is the location data?
The clusters look to be county level at the finest. Is that because the data is county level, or are the clusters naturally county level? Or am I wrong about this observation all together?
The reason I ask is because county level granularity isn’t uniform across the country. It’s much more fine grained in the east than the west.
Extremely cool data! Never thought about geographical hierarchical clustering like this before but it’s really cool
Looks like a new way to establish representational districts.
Love this. To be clear, what analyses did you run to find optimum k, and what was the result?
Edit: and which do you think gave most intuitivelyinterpretable results?
I’m glad to see the distinct split of the Pittsburgh vs Philly rivalry.
Honestly this looks like a more equitable state map than the current state lines. Small and large states are mostly minimized
Did anyone else catch that the first division in the East Coast is between North and South? The initial regional divisions are interesting too.
The disconnect in Illinois being north of 80 and south of 80 is very funny.
Comments are closed.