[OC] Inspired by another post from three years ago, I collected 5,000 random locations from Wikipedia in 4 languages of Russian regions to see how they would be distributed

Posted by Kstantas

2 comments
  1. So, a couple of weeks ago I was browsing this sub looking for interesting things to replicate, and came across [this post](https://www.reddit.com/r/dataisbeautiful/comments/qysx37/oc_inspired_by_the_other_post_last_week_i_pulled/) from three years ago where a man was collecting locations in the three regional languages of Spain.

    I was interested in it, so I decided to make a mini research for the four languages of the peoples of Russia, which have their own autonomy. I chose Tatar and Chechen (the languages of the peoples of Russia with the largest number of articles, but which do not have their own state), as well as Bashkir and Chuvash (close to each other both in the number of articles and geographical location of the republics).

    So, In general, from 4 wikipedia segments I uploaded info about total of 20000 random articles with geographical locations – streets, cities, monuments, geographical objects, all that.

    I ended up with these four maps, from which I made the following observations and hypotheses:

    Due to the fact that the number of articles in the Chechen and Tatar segments is relatively large, the article objects are evenly spread across Europe and Western Russia, while the Bashkir and Chuvash segments are 10 times smaller and one can easily spot the borders of both republics in them.

    At the same time, due to the smaller number of articles, anomalous clusters are noticed in the latter two, primarily in Germany and Estonia, which can presumably be explained either by diaspora or some cultural interaction (for example, the interaction between the authorities of Saxony and Bashkortostan can be traced as far back as 2003).

    Another observation that is not visible on these maps is the surprisingly low number of articles east of the Kemerovo region. I personally assume that this is due to the fact that people are more likely to write about places where they have been themselves, and since the most massive places where speakers of these languages congregate are west of the Urals, the main destinations of their journeys are in the European part of Russia. However, this can be verified by looking at segments in the languages of the peoples living in Siberia.

    To summarise, I really liked doing this mini-research, I’m grateful to the author of the original post for the idea and the original code, and also to the Anthropic team, because without Claude Sonnet’s help I wouldn’t have managed to write the code for Python and R.

Comments are closed.