The most distinctive letter combinations in different European languages

2022-06-07

Tags:
Europe

8 comments

Udzu says:

2022-06-07 at 10:29

Methodology: I extracted 100MB of article texts from each of the different Wikipedias and analysed the letter frequencies using Python. The map shows the letters and letter combinations whose frequencies most exceeds the average for all the languages that use the same script.

**Clarification: the letters can appear in the middle of words, not just at the start or by themselves.**

PS I agree that the algorithm is currently a bit too biased towards common combinations; I’ll have a think about how to pick up more distinct but less common combinations without having the maps dominated by really obscure garbage. Here’s [a first stab](https://i.imgur.com/xAoKUJy.png): I quite like the single letters, but the pairs and triples are very fragile and change completely if I so much as sneeze at the code.
BkkGrl says:

2022-06-07 at 10:39

pretty cool!
Grimson47 says:

2022-06-07 at 10:42

By distinctive it’s meant how many times each letter is repeated, shouldn’t that be something like “The most repeated letter”? I’d say in terms of distinction, our would probably be something like Ж, Щ or Ю, which not all Cyrillic-using countries use.
Urjr382jfi3 says:

2022-06-07 at 10:45

the colors mason, what do they mean
ManatuBear says:

2022-06-07 at 10:46

“ent” for PT? I assume it is from the “mente” ending of adverbs, otherwise I have no idea since only a few words start with “ent”.
TheSecondTraitor says:

2022-06-07 at 10:58

Seems about right for Slovakia, because ___ých is how we end majority of adjectives in plural in Genitive, Accusative and Lokal case for masculine gendered words and in Genitive and Accusative for feminine and neutral words.
General_Ad_1483 says:

2022-06-07 at 11:03

I was sure for Poland it would be “sz” “cz” “trz” or other stuff that is a nightmare for foreigners to look at.
Inductee says:

2022-06-07 at 11:21

ă or ț should be the most distinctive for Romanian, ő or ű for Hungarian, while ść should be the most distinctive 2-letter combination for Polish.

You must be logged in to post a comment.

The most distinctive letter combinations in different European languages

Tags:

8 comments

Leave a Reply