Wednesday, 27 May 2009

A Graph

In my last post I talked about the number of headwords you'd need to know to start reading the NT, if you learned the words in decreasing order of frequency. (You'd need 33 to read Matt 16:15, as it turns out).

We can continue the same process for all the verses in the bible. For each verse assigning a score based on where in the frequency word list the rarest word occurs. If we order all verses in the bible using this technique we end up with this graph.

Initially in the bottom left you need to learn quite a few words before you can read another verse of the bible (it takes 33 to get to verse one, another 2 words for the next verse, then 15 more before you can read number three). Over time though the payoff begins to show. In the middle of the graph, each new word you learn gives you the ability to read between two and three more verses. Then towards the end we end up in the territory of words that only appear once, and so learning them only helps us read one extra verse.

You can think of this as three zones of learning:

1. Early undergraduate study. Everything is tough - everything seems like a special case, and nothing joins up.

2. Late undergraduate study. Suddenly you find yourself reading bits of the NT with minimal help from a lexicon. It is a great feeling.

3. The point beyond where I am. You can read about 2/3 of the NT without reference, but the words you don't know really are a special case here.

I expected the graph to have roughly this shape (flat initially, then steepening before flattening again), but I thought that the steep slope would be even steeper. It turns out that even at best (with a vocab around 1000 words) you get less than three new verses for every word you learn. On the positive side, however, the initial flat bit where progress is slow is far smaller than I expected, at this scale it is really only the smallest of ticks. After a couple of hundred words, you are into the most productive zone.

This is deeply encouraging for students of the language. Just a couple of hundred words and you can do a lot.

Of course, as I've said several times, this applies to vocab only. At some point I'll do a similar analysis of grammatical features.

PS: In case you're wondering, the little kinks at 2655 and 3495 words are where the words start to appear twice in the NT and once in the NT respectively.


Post a Comment