One of the things I’ve been looking at recently is a particular grammatical pattern in various languages including Middle English (i.e. English as spoken in the period 1066 to 1470-ish). Simplifying matters a bit, in older varieties of English some verbs employed have in the “perfect” construction, whereas other verbs took be:

(1) I am comethou art gonehe is fallen …

(2) I have workedthou hast madeshe hath said …

In present-day English we basically only use have, so we use the following forms in place of those in (1):

(3) I have come, you have gonehe has fallen …

But when exactly did Middle English use have and when did it use be? The best way to answer this (the best I’ve been able to come up with at any rate) is to trawl through a great deal of text and see what patterns emerge. A body of texts put together for the purpose of trawling through to look for answers to particular questions in this way is known as a corpus. The corpus I’ve been using is the Helsinki corpus, a collection of texts up to the year 1710 – specifically the 609,000 words of texts from the period 1150-1500.

Obviously 609,000 words is a lot of words (The Lord of the Rings is about 480,000, for comparison, and my copy is 6.3cm thick in very small font). And the frequency of instances of what I’m looking for are pretty small: as a rough estimate, there about 6 instances of the perfect constructions in every thousand words, and only about 5% of all these constructions use be rather than have.

Thankfully advances in modern technology (specifically, in my case, the Microsoft Word search function) mean I don’t have to read through the entire length of the corpus hoping to spot the relevant constructions on the rare occasions when they do turn up. But even with the aid of the search facility, the process is still a rather drawn out one. There are two reasons for this: firstly, the irregularity of the verb to be, and secondly, the irregularity of English spelling in the period in question.

Regarding the first, observe that be in English has multiple different forms: beamareiswerewas etc. For one thing, there are simply more forms than we find for any other verb: compare the following:

(4) I am, you areI was, you were (different forms for different persons)

(5) I love, you loveI lovedyou loved (same forms in each tense regardless of person)

For another, many of the forms of be are completely different from each other, with no shared material. Thus, whilst all the forms of love begin with the letters lov- (love, loves, loved, loving), there is no sequence of letters which is common to all the forms of be.

To make matters worse, in Middle English there were even more forms of be. art, as in thou art, was very common, and there were also forms like they weren (= they were), sindan (= they are), he/she bið (= he/she is). To get the full picture, these need searching for as well.

This is compounded still further by the second problem: spelling. Spelling in Middle English wasn’t standardised and there was a great deal of variation in how words were spelled. Even for a little word like is spellings found include isissesse, ysyssehishys, hes, yes and so on and so forth. am is spelled ameomeamæm, ham … All these various spellings need to be taken into account for a comprehensive survey.

Some corpora may allow you to get around this sort of problem through tagging. In a tagged corpus, each word is associated with a tag which tells you what sort of word it is. The tags used vary, but some corpora specifically mark forms of be and have with their own particular codes, which makes them a lot easier to track down. Obviously, though, the corpus has to be tagged in the first place, which is a lot of work. This can be mitigated to some extent by getting a computer to do it for you, although computers aren’t 100% accurate at this sort of thing so it still needs to be checked by a real person.

After all this, what have I discovered? I’m approaching my word limit, so I’ll have to be quick, but basically verbs in English which took be in the perfect seem to have been either “change of location” verbs like gocome, fall or “change of state” verbs like become. This is interesting because – whilst languages which have this construction vary in how many verbs take be rather than have – there’s been a prediction that if any verbs take be they will include the change of location verbs, and if the class of be verbs is any larger than that it will include the change of state verbs. So Middle English supports that prediction.

In fact, the class of verbs which took be in Middle English is much the same as in modern French (where you say je suis allé(e) “I am gone” and not *j’ai allé “I have gone”). Might this be due to contact between English and French? Probably not, because the French spoken at the time of Middle English allowed be with a much larger set of verbs. This suggests we need to seek out a deeper explanation for the similarities, rooted in the psychology of linguistic processing.

Ultimately, then, I’ve found something out, and so all this corpus-trawling has been worth it.