British National Corpus
The British National Corpus has long been the gold standard for British English, providing representative data about grammar (in its widest sense) and vocabulary (in its widest sense) in a representative cross-section of many fields (domains) and modes (mediums). Work on building the corpus began in 1991, and was completed in 1994. It is a snapshot of British English in the late 1980s and early 1990s and is therefore a synchronic corpus.
At the BNC homepage, you can read about its history, its composition and other background information. There is also a Wikipedia page.
No new texts have been added to the BNC since then, which on one hand means that its data is stable and the same search will always yield the same data. On the other hand, it contains no new coinages since that time. Recent linguistic accretions such as friend as a verb, as in ‘a social networker friends another social networker’ are not represented. And email gets a mere 34 hits, the same as Luddite, coincidentally. Even the word website is not in the BNC, although web site is – once – in its c.100 million words.
Tools that use the British National Corpus
Just the Word
English Corpora (BYU)
StringNet, 4.0 takes an English word (or words) as a query and responds with a ranked list of multiword and lexico-grammatical patterns in which that word is conventionally used (or in which those words conventionally co-occur) and concordances for each pattern.
As a ‘net’, StringNet also links each pattern to its related patterns, to its more abstract counterparts (its parents) and more specific counterparts (its children). So, for example, it links ‘consider yourself lucky’ to its parents:
Also, clicking on any word or slot in any pattern displays its paradigm, a list of the substitutable words there representing the attested variation for that slot in that exact context.
Register Explorer is an online search engine that searches the British National Corpus. When you search for a word, it shows ten words associated with your search word in each of its five registers: non-fiction, spoken, fiction, academic and news. Off to the side of the webpage is another set of ten words under General English. All of the words are the same part of speech as your search word, so they are not collocates.