Most similar language to each European language, based purely on letter distribution

16 comments
  1. Thought it was mildly interesting that most European languages can be grouped sensibly merely by looking at their letter distributions.

    The similarity measure used here is just the sum of the absolute differences in character prevalences (so a lower score means more similar): e.g. if language A has distribution `{A: 0.5, B: 0.3, C: 0.2}` and language B has distribution `{A: 0.8, B: 0.2}` then their similarity is `|0.5-0.8|+|0.3-0.2|+|0.2-0.0|=0.6`.

    **Notice to all Bosnian-Croatian-Montenegrin-Serbian speakers: this is about *written* similarity, not *spoken* similarity!**

  2. I was wondering why there aren’t connection between Portuguese and Romanian, but then I saw how it is based and calmed myself down.

  3. As someone from Serbia I can’t understand any of the languages from the group that is close to Serbian in this graph.

    When I go to Croatia, I speak Serbian and they speak Croatian and we all understand each other completely.

    TLDR – this graph is not quite meaningful. I guess the only criteria was alphabet, but for languages it’s not true.

  4. One of these high-effort OG posts that will get downvoted to hell by people not understanding it…

  5. The Welsh to English is wrong , Welsh has more letters (29) and more vowels (7) it’s more related to Irish than English on the language tree English is Germanic in origin

  6. This is terrible. How can you connect languages via alphabet? I would say, among those, Hungarian would be most similar to Turkish regarding vocabulary, grammar, etc. And then Russian and Bulgarian maybe due to having Turkic influence in their daily words.

Leave a Reply