The data comes from a test I built that measures receptive vocabulary — the number of words a person recognizes (but may not necessarily use). It places everyone — from a student who has just started learning English to an educated native speaker — on the same scale. The units are word families (so limit, limited, and limitless count as a single unit). Users self-reported their CEFR levels.

It’s striking to see how much one has to learn to progress from level to level and potentially reach the native range.

Posted by RevolutionaryLove134

42 comments
  1. I feel like part of the spread has to do with the original language of the user.

    Someone who natively speaks a Germanic or Latin language is going to probably know quite a lot of Germanic and Latin words, respectively. Although their overall grasp of the language might not be great. Conversely someone from an unrelated language might need to have studied for a long time to match the vocab depth, but would have a much better grasp of other areas.

  2. Took the test. It was really interesting. A few times it made me question my sanity because of the fake words.

    It correctly identified me as a native speaker.

  3. Phew, I have native level English!

    Nice test – will it be available in other languages?

  4. Cool test and data! One observation: the output word count from the test is unreadable when on dark mode (Android, Firefox). The dark blue text is almost the same as the dark grey background

  5. Re: Your test. Yes, I do know the meaning of the word enceinte. It just doesn’t happen to be English. :p

  6. Excellent little app you have there. Good job!

  7. Can you fix the German test? It always freezes on the last word and I desperately need to know how bad I am at German.

    Also thank you, lots of fun!

  8. what if they have a native vocabulary but heavy accent or makes grammar mistakes?

  9. I’m glad I scored above the median (?) native speaker, because I’m pretty sure I’d do a lot worse in my native language

  10. The test is really well made. I’m C1 it seems. There are so many words that I’ve read and heard countless times, but don’t know the exact meaning of. For example, I will typically understand a sentence with words like “embellish” or “egregious” in it without really knowing the word, and so I don’t bother looking it up. Maybe I should bother.

  11. Interesting. I am honestly surprised that the distribution curve isn’t larger for native speakers. Perhaps that means it isn’t so hard to raise someone’s reading level. I am at 90th percentile despite only knowing 23.5% more words than the average person.

  12. God damn these stupid violin plots!

    What exactly is the Y axis units between B1 & B2? What’s difference between green points above B1 and below that line.

    A histogram if modality is important, a box and whiskers if it’s not.

    Yeah yeah, those won’t look ‘as detailed’…  But that’s just it you’re not adding detail to data, you’re adding noise to art.

    /Rant

  13. Avoided the fake words and got the definitions correct…. A few of those fake words as others have said had me questioning myself and other words …. I may start using them see if I can get one or two going in a friend group

  14. Cool stuff!! What did you use for the dataviz if I may ask?

  15. Great test, I do feel like some of the options when it asks you to define a word are a bit weird, but it might be just due to alternative meanings or me being dumb.

  16. Very interesting! It aligns reasonably well with what I’ve read before on the vocabulary size per CEFR level, although a bit smoother of a curve (also, A1 seems quite a bit higher than expected). If you’re curious, you can find a non-paywall link to the paper that their definition of a word family is based on here: [https://www.lextutor.ca/morpho/fam_affix/bauer_nation_1993.pdf](https://www.lextutor.ca/morpho/fam_affix/bauer_nation_1993.pdf) .

    An interesting thought is that the productive vocabulary growth in real terms is probably a good deal larger than this suggests; as you progress in a language, you not only recognize more word families, but you’re able to use more members of the word families you already know. For instance, the Paul Nation article there gives 16 different words within the single word family “develop”. Eyeballing it, an A1 speaker might only be able to productively use maybe 3-4 of them, whereas a native speaker would be able to use all or nearly all. So while the above may show that a native speaker knows “about 10 times as many words” as an A1 speaker, I wouldn’t be surprised if the active vocabulary of a native speaker were 20 or 30 times larger.

  17. Best A2 is stronger than worse C2? hehe. Great system that is.

  18. Thanks for the fun test! One note, in dark mode the final result is almost unreadable because it’s dark blue against a black background. And that’s what I’ll blame for my score being lower than I’d like!

  19. I took the test in German and English.

    The German one is a little wacky because it didn’t use the capitalization rules

  20. I’m classed in C2 category. I’m a native English speaker, but I don’t know the meaning of many words (just know they exist), so I’m not entirely surprised

  21. Really well done

    Thought I was hot stuff but nope, 48% vs Native speakers (classified C2, 15300)

    That said, I was very honest (and found all 10 fake words) so I suspect some people are being a bit generous. I suspect the median person isn’t taking this test either 🙂

  22. Nice data and fun test. One remark regarding the test – at least for Polish it gave weird options as answers, like for “intruz” / intruder, I’m guessing the answer was “gość” / guest probably because intruder is an unwanted guest, but that’s a really bad way to put it if it’s missing the adjective.

  23. As a native English speaker I am a C2.

    Glad there are some fake words here because I was confused 😂😂😂😂😂

  24. That was interesting, especially the validity checks.

    Here come the dick swingers …

  25. Wait, you built that? I took that a few weeks ago and thought it was super cool. Great job.

  26. This kinda suggests, as I have often half-seriously said before, that there exists a D1 level of language.

  27. So I just took two tests, with very different results:

    [vocabularytester.com](http://vocabularytester.com) – C2, ‘size’ 37,895 (not sure what size means exactly)

    [myvocab.info](http://myvocab.info) – C2, 21,500 word families

    The first site was substantially easier: far more test words, but very few challenging ones; i was only really unsure of 2, whereas the myvocab test was the opposite: relatively few test words but all were challenging.

    Fun 🙂

  28. Itd be cool to get a list of your mistakes – I got a pretty decent result but I also avoided all wrong words and didn’t get 25k which means I have no idea which words I correctly identified as tricks and which words I should look up.

    Also ascetic doesnt really mean “strict”? Not according to any dictionary anyway. I almost clicked “fast” because I figured you meant it like the verb lol

  29. How does the test take into account domain-specific vocabulary knowledge? Like medical, engineering, and legal terms.

  30. I have a list of 26k words built just from my own chat logs. I feel like the average for a native speaker shown here is quite low.

Comments are closed.