[OC] Vocabulary size at each English proficiency level

The data comes from a test I built that measures receptive vocabulary — the number of words a person recognizes (but may not necessarily use). It places everyone — from a student who has just started learning English to an educated native speaker — on the same scale. The units are word families (so limit, limited, and limitless count as a single unit). Users self-reported their CEFR levels.

It’s striking to see how much one has to learn to progress from level to level and potentially reach the native range.

Posted by RevolutionaryLove134

42 comments

thegodzilla25 says:

2025-12-09 at 21:21

Cool test! Took 2 mins and I learnt some things!
Zigxy says:

2025-12-09 at 21:23

I feel like part of the spread has to do with the original language of the user.

Someone who natively speaks a Germanic or Latin language is going to probably know quite a lot of Germanic and Latin words, respectively. Although their overall grasp of the language might not be great. Conversely someone from an unrelated language might need to have studied for a long time to match the vocab depth, but would have a much better grasp of other areas.
BiBoFieTo says:

2025-12-09 at 21:32

Took the test. It was really interesting. A few times it made me question my sanity because of the fake words.

It correctly identified me as a native speaker.
DameKumquat says:

2025-12-09 at 21:35

Phew, I have native level English!

Nice test – will it be available in other languages?
QuantumIce8 says:

2025-12-09 at 21:35

Cool test and data! One observation: the output word count from the test is unreadable when on dark mode (Android, Firefox). The dark blue text is almost the same as the dark grey background
Few-Interview-1996 says:

2025-12-09 at 21:39

Re: Your test. Yes, I do know the meaning of the word enceinte. It just doesn’t happen to be English. :p
highlyeducated_idiot says:

2025-12-09 at 21:43

Excellent little app you have there. Good job!
PristineAnt9 says:

2025-12-09 at 21:44

Can you fix the German test? It always freezes on the last word and I desperately need to know how bad I am at German.

Also thank you, lots of fun!
samuelazers says:

2025-12-09 at 21:48

what if they have a native vocabulary but heavy accent or makes grammar mistakes?
makkerker says:

2025-12-09 at 21:50

It is not size that matters but how do you use it
zombiecalypse says:

2025-12-09 at 21:52

I’m glad I scored above the median (?) native speaker, because I’m pretty sure I’d do a lot worse in my native language
warnerbolanos says:

2025-12-09 at 21:54

The German test gets stuck on the last word.
Enuntiatrix says:

2025-12-09 at 21:56

https://preview.redd.it/5jehow00396g1.png?width=720&format=png&auto=webp&s=4c7ea0529b69d3186bdd745a271da090579ff4fc

Very nice. I’m a non-native speaker, but I started with English in school 20 years ago. Perhaps the only subject I ever needed IRL, to be honest.
PHealthy says:

2025-12-09 at 21:58

You should do this test but for risk literacy
akurgo says:

2025-12-09 at 22:03

The test is really well made. I’m C1 it seems. There are so many words that I’ve read and heard countless times, but don’t know the exact meaning of. For example, I will typically understand a sentence with words like “embellish” or “egregious” in it without really knowing the word, and so I don’t bother looking it up. Maybe I should bother.
tka4nik says:

2025-12-09 at 22:10

Nice work, and very cool test!

~~Someone already mentioned that for some languages, the last word (if the result is non-trivial, as in if you didn’t press all “don’t know”) freezes up and doesn’t show the results. Can confirm the bug for Russian as well~~

https://preview.redd.it/uacjw1ce596g1.png?width=715&format=png&auto=webp&s=572a9d9ad4423f45f30df3fdbf4cb0a7ce7817e0

Seems like you’ve already fixed the bug, good job!!
turb0_encapsulator says:

2025-12-09 at 22:14

Interesting. I am honestly surprised that the distribution curve isn’t larger for native speakers. Perhaps that means it isn’t so hard to raise someone’s reading level. I am at 90th percentile despite only knowing 23.5% more words than the average person.
n4s0 says:

2025-12-09 at 22:16

This is pretty cool. Thanks!
ChengliChengbao says:

2025-12-09 at 22:20

im a native speaker and i got C1

amazing
GreyGoblin says:

2025-12-09 at 22:28

God damn these stupid violin plots!

What exactly is the Y axis units between B1 & B2? What’s difference between green points above B1 and below that line.

A histogram if modality is important, a box and whiskers if it’s not.

Yeah yeah, those won’t look ‘as detailed’… But that’s just it you’re not adding detail to data, you’re adding noise to art.

/Rant
hansrotec says:

2025-12-09 at 22:29

Avoided the fake words and got the definitions correct…. A few of those fake words as others have said had me questioning myself and other words …. I may start using them see if I can get one or two going in a friend group
thespermthatsurvived says:

2025-12-09 at 22:35

Cool stuff!! What did you use for the dataviz if I may ask?
thebowlman says:

2025-12-09 at 22:35

What is the difference between C2 and Native?
Rafa_50 says:

2025-12-09 at 22:35

Great test, I do feel like some of the options when it asks you to define a word are a bit weird, but it might be just due to alternative meanings or me being dumb.
Devilnaht says:

2025-12-09 at 22:41

Very interesting! It aligns reasonably well with what I’ve read before on the vocabulary size per CEFR level, although a bit smoother of a curve (also, A1 seems quite a bit higher than expected). If you’re curious, you can find a non-paywall link to the paper that their definition of a word family is based on here: [https://www.lextutor.ca/morpho/fam_affix/bauer_nation_1993.pdf](https://www.lextutor.ca/morpho/fam_affix/bauer_nation_1993.pdf) .

An interesting thought is that the productive vocabulary growth in real terms is probably a good deal larger than this suggests; as you progress in a language, you not only recognize more word families, but you’re able to use more members of the word families you already know. For instance, the Paul Nation article there gives 16 different words within the single word family “develop”. Eyeballing it, an A1 speaker might only be able to productively use maybe 3-4 of them, whereas a native speaker would be able to use all or nearly all. So while the above may show that a native speaker knows “about 10 times as many words” as an A1 speaker, I wouldn’t be surprised if the active vocabulary of a native speaker were 20 or 30 times larger.
Oneforallandbeyondd says:

2025-12-09 at 22:44

Best A2 is stronger than worse C2? hehe. Great system that is.
JJBrazman says:

2025-12-09 at 22:47

Thanks for the fun test! One note, in dark mode the final result is almost unreadable because it’s dark blue against a black background. And that’s what I’ll blame for my score being lower than I’d like!
Schuesselpflanze says:

2025-12-09 at 22:52

I took the test in German and English.

The German one is a little wacky because it didn’t use the capitalization rules
Vorschrift says:

2025-12-09 at 23:12

I…. C2. Believe you not?
Sensitive-Reaction32 says:

2025-12-09 at 23:12

I’m classed in C2 category. I’m a native English speaker, but I don’t know the meaning of many words (just know they exist), so I’m not entirely surprised
TheBigBo-Peep says:

2025-12-09 at 23:16

Really well done

Thought I was hot stuff but nope, 48% vs Native speakers (classified C2, 15300)

That said, I was very honest (and found all 10 fake words) so I suspect some people are being a bit generous. I suspect the median person isn’t taking this test either 🙂
Ariel90x says:

2025-12-09 at 23:25

https://preview.redd.it/bis74smqi96g1.jpeg?width=583&format=pjpg&auto=webp&s=29f3c1aca40ceae5f737044fc86b3ee0c0de099a

I’m Italian, I studied Latin and German and IMO this test is broken from someone like me since most of the hard words are either Germanic or from Latin\French.
polypolip says:

2025-12-09 at 23:32

Nice data and fun test. One remark regarding the test – at least for Polish it gave weird options as answers, like for “intruz” / intruder, I’m guessing the answer was “gość” / guest probably because intruder is an unwanted guest, but that’s a really bad way to put it if it’s missing the adjective.
Malorn44 says:

2025-12-09 at 23:40

Would be interested in seeing this for Japanese
heyitsmemaya says:

2025-12-09 at 23:54

As a native English speaker I am a C2.

Glad there are some fake words here because I was confused 😂😂😂😂😂
CletusDSpuckler says:

2025-12-10 at 00:10

That was interesting, especially the validity checks.

Here come the dick swingers …
Darth_Bane_1032 says:

2025-12-10 at 00:52

Wait, you built that? I took that a few weeks ago and thought it was super cool. Great job.
Nuclear_rabbit says:

2025-12-10 at 01:41

This kinda suggests, as I have often half-seriously said before, that there exists a D1 level of language.
highsilesian says:

2025-12-10 at 02:29

So I just took two tests, with very different results:

[vocabularytester.com](http://vocabularytester.com) – C2, ‘size’ 37,895 (not sure what size means exactly)

[myvocab.info](http://myvocab.info) – C2, 21,500 word families

The first site was substantially easier: far more test words, but very few challenging ones; i was only really unsure of 2, whereas the myvocab test was the opposite: relatively few test words but all were challenging.

Fun 🙂
EarthMantle00 says:

2025-12-10 at 02:55

Itd be cool to get a list of your mistakes – I got a pretty decent result but I also avoided all wrong words and didn’t get 25k which means I have no idea which words I correctly identified as tricks and which words I should look up.

Also ascetic doesnt really mean “strict”? Not according to any dictionary anyway. I almost clicked “fast” because I figured you meant it like the verb lol
StupidWiseGuy says:

2025-12-10 at 03:37

How does the test take into account domain-specific vocabulary knowledge? Like medical, engineering, and legal terms.
DeProgrammer99 says:

2025-12-10 at 03:39

I have a list of 26k words built just from my own chat logs. I feel like the average for a native speaker shown here is quite low.

Comments are closed.

[OC] Vocabulary size at each English proficiency level

Tags:

42 comments