Tuesday, 26 May 2009

Following on from my previous post, I got to thinking about a "mathematically optimal greek course". One that casts aside any concept of pedagogy and optimizes for a hypothetical learning machine.

We'd want to teach this machine its vocab in roughly decreasing order of frequency. So words that appear in the NT most often are taught first. We know from the last post that after about 200 words, it would know every word that appears 100 or more times. But what does this mean.

In particular, if we learned words this way, when would we be able to read our first NT verse? And after learning 200 words, how many verses would we be able to read without help?

We can calculate this very easily. Using our frequency ordered word list we can assign each verse in the bible a score based on the lowest item in the word list. So a verse with no rare words will have a low score, and a verse containing a very rare word will score highly.

It turns out the easiest verse to read, on this scoring system, is Matt 16:15 '"But what about you?" he asked. "Who do you say I am?"' (λέγει αὐτοῖς ὑμεῖς δὲ τίνα με λέγετε εἶναι) It scores 33. So its most uncommon word (τίνα, from τίς "who") is 33 on the frequency word list. Actually all the other words in the verse are in the top 10, even in English you can see that most of them are very common (the verb λέγω "to say" is at number 9, but is the most common verb in the NT).

After Matt 16:15 comes 1 Cor 3:23 'as we are to Christ, Christ is to God' (ὑμεῖς δὲ Χριστοῦ Χριστὸς δὲ θεοῦ) with a score of 35. Then there's a small gap before we get several with scores in the 50s and 60s (some are just short, like John 10:30, others are surprisingly complex, like 1 Cor 8:6, with 27 greek words in the verse, but a score of just 50 - coming in joint 3rd overall).

Now, as in the previous post, I've ignored morphology, which skews these results. You couldn't just teach someone 33 words and have them read Matt 16:15. To get a proper curriculum for the hypothetical greek learning machine, we'd need to include grammar, and that is the subject for another post.


