That title sounds like a gimmick.
Not to worry.
I dislike gimmicks as much as you do but if you want some genuine advice on how to master at least 20% of every conversation in any target language then keep reading.
I’m open, as always, to your input so feel free to comment below.
How corpus linguistics can help accelerate your language learning
Have a look at these words:
the | they | when | come |
be | we | make | its |
to | say | can | over |
of | her | like | think |
and | she | time | also |
and | or | no | back |
in | an | just | after |
that | will | him | use |
have | my | know | two |
I | one | take | how |
it | all | people | our |
for | would | into | work |
not | there | year | first |
on | their | your | well |
with | what | good | way |
he | so | some | even |
as | up | could | new |
you | out | them | want |
do | if | see | because |
at | about | other | any |
this | who | than | these |
but | get | then | give |
his | which | now | day |
by | go | look | most |
from | me | only | us |
According to the Oxford English Corpus, these are the 100 most common words (derived from their database of texts) in the English language.
A quick glance at this list is enough to know these words can’t really be used to have any real meaningful conversation on their own, but you use all or most of them every day in nearly all of your interactions with other people.
They “fill” the gaps of your dialogue and act as a kind of framework for all the other terms you use.
It doesn’t matter how easy or advanced the conversation level is either – these words are absolutely crucial to our conversations. If a couple of thousand words is all that’s needed to communicate on most general topics, then learning the most foundational ~100 words is a huge step toward that goal.
And that few vocabulary doesn’t take long to learn either.
The article mentioned above quotes Peter Howarth (of Leeds University) as saying:
“It’s a ridiculously small number, you could learn 100 words in a couple of days, particularly when you’re in the country surrounded by the language.”
Every language has a list of common words like this (despite some language variations with some terms that don’t have equivalents, e.g. definite articles in Mandarin).
Here’s a similar list for French so you can see what I mean:
le | son | encore | an |
de | mettre | nouveau | monde |
un | autre | aller | jour |
à | on | cela | monsieur |
être | mais | entre | demander |
et | nous | premier | alors |
en | comme | vouloir | après |
avoir | ou | déjà | trouver |
que | si | grand | personne |
pour | leur | mon | rendre |
dans | y | me | part |
ce | dire | moins | dernier |
il | elle | aucun | venir |
qui | devoir | lui | pendant |
ne | avant | temps | passer |
sur | deux | très | peu |
se | même | savoir | lequel |
pas | prendre | falloir | suite |
plus | aussi | voir | bon |
pouvoir | celui | quelque | comprendre |
par | donner | sans | depuis |
je | bien | raison | point |
avec | où | notre | ainsi |
tout | fois | dont | heure |
faire | vous | non | rester |
Thanks to corpus linguistics, word frequency data like this is available for a lot of languages.
If there aren’t any lists like this available for your target language, use the English list and find your target language equivalents (if they exist).
Sometimes you’ll find that one of these English words has an affix equivalent in another language, e.g. the = al- in Arabic, but the frequency is still generally the same across the board (I’d like to hear your own thoughts on this in the comments section below).
Now, whether your goal is conversation or just to be able to read foreign text, common terms like this should be your priority.
I usually start immediately with this vocabulary and worry about vocabulary for household items, travel, family, etc. later on as the need arises.
If you can hit up this core vocabulary immediately, you’ll find yourself able to tackle written and spoken dialogue a lot faster, for the simple reason that you’ll encounter these words in almost every instance.
A predictably obvious option here is to create a custom set of flashcards in AnkiSRS and learn the 100 most common words in your target language through spaced repetition.
I’ll give you one of my own methods as an alternative.
How to learn a language through gradual gap fill
I like to use real, native-speaker dialogue from the very beginning when I learn a language.
Music and slowed, turn-based conversation snippets (the kind that come with most self-learning books) can be helpful but I’m a firm believer in sticking with natural conversation from the start because that’s what you’re aiming for.
I’m not interested in unnaturally simplified or non-native conversation.
So take a video from YouTube or an audio clip from a language course (Rocket‘s an ideal resource for this) and start out with a single word on your list (either obtained from a corpus list or self-compiled).
Let’s say you take the word pour (for).
Play the clip through multiple times listening only for pour and ignoring everything else. Then take the next word like mais (but) for example and listen carefully to the clip for occurrences of both mais and pour, blocking the rest out again.
Continue doing this for your entire list of core words.
Here’s something of a visualization of what you’d be listening for with the first two words:
blah blah blah blah pour blah blah blah mais blah blah blah blah mais blah blah blah pour blah blah blah blah blah pour blah blah blah blah blah
Not the best visualization but hopefully you see what I mean!
With plenty of listening you build it up so you’ve trained your ears to quickly recognize these core words.
If you can catch and understand these core words, you’ve already mastered at least 20% of the dialogue or conversation.
You start out with one and after a while you’ve filled a lot of gaps with those common words.
Recognizing them in a conversation becomes a lot easier (I’m still struggling with Irish but I catch a lot of these core words immediately even though I don’t yet understand the rest). The other advantage to listening like this is that you’re working on your comprehension skills at the same time.
If you’re at a higher level you can use this exercise with a more advanced vocabulary list too.
Just make sure that the clip you’re listening to is related to the vocab list you’ve got so that you’re guaranteed to get good exposure to the words on your list (e.g. listen to a sports commentary if the list contains related verbs like hit, kick, run, goal, etc.).
Don’t get bogged down by grammar and different word forms for this exercise.
Just concentrate on root forms so you can recognize it when you hear it.
The thing about core words is that no matter what you listen to or read, you’re guaranteed to encounter them frequently so make sure to learn them early.
🎓 Cite article
22 COMMENTS
NO ADVERTISING. Links will be automatically flagged for moderation.
Kamal Abdulla
How can I find the arabic list for corpus linguistics?
David
I’m no expert on languages, but as someone who has recently landed in Greece and daunted by the task of learning its language I was encouraged by reading this blog post. It is clearly written and offers a sensible approach, which I intend to implement. Thanks a lot (:
Adam
I think it’s great because when you get these words, you start to understand the structure of a sentence. And when you understand the structure, it’s much easier to understand what type of words you are looking at. For example, nouns, adjectives, verbs etc... This is key in gleaning by context. I think people are underestimating the usefulness of these words. :) btw, I looked it up for German and found that I already know most of them. Will have to work on a few of them though...
Alex
Just wanted to say this is great. This came at perfect timing for me, I’ve been learning Hmong and I understand many of the meaning full words but I don’t understand the context that they are in.
Kali
I want to say two things to this:
Firstly, I think it’s quite useful.
But I’ve actually gone through a phase recently where I could understand those words but little more in far too many languages (I’ve moved recently and been exposed to all my new acquaintances’ native languages) and it can be really frustrating. Most times I actually find it easier to listen to people speak Hausa - where I will only understand random English words they throw in - than Urdu (and far too many other South Asian languages) where I will understand most of the “little words” but far too few of the ones that carry meaning to get what people are talking about while constantly having this feeling of recognition. I realise it’s a good sign, and it is gradually getting better, but when I’m tired it can be incredibly frustrating and hurt my motivation.
@ielanguages
I agree that the most frequent words should be learned first since linguistic research supports that idea, but I just wanted to point out a few problems that arise with corpus linguistics. Most notably is that most frequency lists are generated with orthographic words instead of lexical chunks, which are arguably more important to understanding overall meaning in language. Plus there is usually no information (beyond part of speech) on which definition or use of the word is most frequent. With regards to French, the fact that it is a syllable-timed language with liaison also makes it harder for listeners to hear individual (orthographic) words. It’s a better system for some languages than for others.
Casey
Ah, that link is working again :) I’ve been trying to access the ucla’s quechua resource site for a couple weeks, but it has been down.
As for recording native speakers, that’s what I’ve been doing. Recording my tutor as well as some Quechua speaking friends I’ve made.
My plan is to continue to study here for another month, until I’ve got a firm beginner/intermediate conversational grasp on the language (basically to the point that I can use Quechua to learn more Quechua), then go far out into the campo -- where spanish isn’t spoken -- and immerse myself in a community for several more months. We’ll see how it goes.
Casey
Donovan, when you put it in terms of understanding lexical content rather than semantic content (which is what I thought you meant at first), I totally agree. In the early stages of learning a language, a huge challenge for me (and others I assume), is simply recognizing distinct morphemes/lexical tokens in spoken speech. Your technique here will definitely help one improve in that area.
Thanks for the compliment. I actually blog about non-technical stuff over at Elusive Truth as the audiences tend to be different.
I’ve downloaded pretty much every video on youtube with Quechua dialog/lyrics; the ones of usable quality number less than 20.
This week I bought a digital voice recorder, and have been going wild recording audio while out on the streets. So far I’ve been editing it and putting it into Anki, but I plan to eventually put recordings up on my Quechua resource site.
mezzoguild
Sounds great.
Have you got Quechua-speaking friends? Maybe they could help you record and compile a small database of words or phrases that you could make available on your site.
It’s tough with these kinds of languages. I’m planning on starting a challenge later in the year to learn an Australian Aboriginal language (I’m currently trying to organize an immersion trip to a remote community in Central/Western Australia). There are scant resources available so I’ll either have to buy some from a government-funded program if it’s available or go there and gather it myself from the locals .
Casey
I think this is general a good method to use for practicing listening comprehension and picking up important words; however, the claim that you’ll understand 20% of the conversation, while technically it might be true, is misleading because you likely still won’t understand the topic of the conversation.
Focusing on frequency lists can be a useful strategy, though when at a basic introductory level, I think you’re better off learning “real” words first.
Also, I totally understand your experience with MSA’s connectors, I went through the same thing. Though, when you finally got around to studying those connectors, you were way beyond a beginner’s level, and probably understood 80% of the meaning already. Filling in that other 20% was undoubtedly useful, but learning that 20% first would’ve been useless and frustrating.
Just my thoughts. Great hints for practicing listening. I wish my current target language had native speaker dialogue available on youtube, I’m having a tough time getting exposure.
mezzoguild
Thanks, Casey.
Let me clarify - when I say you’ll understand 20% of the conversation, I’m referring to content, i.e. individual words, not understanding the overall topic or meaning.
I checked out your site too. Nice to see a *nix programmer who loves languages blogging about them both in one place. I’ll have a proper read through later.
Did you have much success finding Quechua resources?
Polyglottally
I have to agree with some of the others here. The title of your post is somewhat misleading. While the top 100 words may cover 20% of the volume of what’s said, you’re comprehension of the conversation based on just those words is probably still 0%.
For example, if I take that last paragraph and remove the words not on your list, it would read: “I have to ... with some of the others ..., which makes the ... of your ... ... ... ... the ... ... ... ... ... .... of the ... of what ... ..., you ... ... of the ... ... on just ... ... is ... ... ...” That’s 24/51 words understood (nearly 50%), yet comprehension is still zero.
Having said that, though, I think your exercise of targeted listening for these words for recognition is brilliant. In Russian, the не negative is often so subtle that it blends with the next word and if you miss it, you’re lost. Ear training is absolutely mandatory.
mezzoguild
Thanks!
As I said in my reply to Casey above, I did actually mean 20% of the content, as opposed to 20% of overall meaning.
You’re right - the paragraph makes no sense to anybody, but in terms of lexical content you could potentially understand half of it by knowing those common words.
I bookmarked your site too. You’ve got some great content on there.
Robert
ok, what’s this language that doesn’t have any native speaker dialogue on youtube?
mezzoguild
Robert, I think he’s referring to Quechua.
j4p4n
Just want to make sure you see my honest feedback: This is useless. The most common words are usually the least important in a sentence. The core idea is good, train you to listen for important key phrases/words. But thinking you can do this from a short 100 word frequency list and suddenly understand anything meaningful from a conversation is silly. And one thing this technique completely skips is implied meaning languages such as Japanese etc. In some languages it is more important what is not said than what is said. So listen all you want for 100 common frequent words, it wont help you pick up on anything not said. So sorry web author, try again, bad advice, weak idea. Your heart is in the right place though :) Good luck!
@niblica
I think the point of these exercise is simply to make your ears more accustomed to the foreign language. I know I get excited when I recognise any word in my target language, even if it’s just “those” or “however”. That’s how I understood this, at least.
mezzoguild
I like honest feedback. Thanks.
I did say that the 100 most common words on their own can’t really be used in a meaningful conversation.
My point is that the frequency of these words means that if you understand them, and have trained your ears to comprehend them by lots of listening, you already understand a lot.
To give you an example from own experience: I ignored a lot of connectors and high-frequency terms in Modern Standard Arabic studies for years. I’d pick up a newspaper and start reading an article on Middle East politics, understand the big words but be reaching for a dictionary every 30 seconds for the high-frequency, insignificant little words even though they made up a big part of the article.
That’s why it’s so important to get these down first - in my opinion of course. Thanks for the constructive criticism though.
Dop
”An” in french, could you use in a sentence please ?
mezzoguild
Joyeux Nouvel An.
Robert
Well, if they were high frequency words you wouldn’t be looking them up for long as you’d soon encounter them all and learn them. So basically, if they’re such high frequency you don’t need to make a special list as by definition you’ll soon encounter them.