Vocabulary Profiler and key words
The Vocabulary Profiler is an online tool, one many in the suite of programs at the Compleat Lexical Tutor by Tom Cobb.
Within a text, key words are relatively frequent but in the language at large, key words are generally low frequency. This can be demonstrated using the online tool Vocabulary Profiler. You can paste any text into the Main Text field and click the Submit button. The results tell us what percentage of words in a specific text are:
The first category is full of very high frequency words including most functions words, i.e. prepositions, pronouns, conjunctions, as well as words such as time, year, period, person, give, should, still, become. The second category includes such words as ask, doubt, speak, completely, animal, considerable safety, keep, difficulty, support out of its thousand words. The third category contains words that are common in academic language, but it excludes the first two categories because these words are common everywhere and it excludes technical and specific words that belong to specific fields. Among its 570 word families are indicate, transfer, instruct, abandon, lecture, bias, assignment. Thus, the AWL is not field specific. There are hundreds of websites that used the AWL – here is its homepage. |
The fourth category is referred to as the “off list” as its words are not on the other three lists. It contains many of the words around which the messages of the text revolve. There are some examples of the relative percentages of these four categories on the individual topic trail webpages, e.g. Amour, Mad Cows Disease, where the categories are colour-coded. When you see a number of these frequency tables, you will observe the consistently high frequency of Category 1 and the consistently low frequency of Category 2. The percentage of Category 3 depends on the nature of the text. But in terms of topic trails and learning language from language, Category 4 is of the most interest. The Vocabulary Profiler, like many online language tools, does not have "linguistic intelligence": it does not identify the parts of speech of words, it does not group word forms into lemmas, and it does not recognise compound words. Furthermore, the key words in a text will be repeated and they will be referenced in various ways, e.g.
These drawbacks make the tool a useful starting point but human analysis is required for more robust analyses. |