The Lancaster-Oslo/Bergen (LOB) Corpus is a one-million-word collection of British English texts which was compiled in the 1970s in collaboration between the University of Lancaster, the University of Oslo, and the Norwegian Computing Centre for the Humanities, Bergen, to provide a British counterpart to the Brown Corpus compiled by Henry Kučera and W. Nelson Francis for American English in the 1960s.[1]

Its composition was designed to match the original Brown corpus in terms of its size and genres as closely as possible using documents published in the UK in 1961 by British authors.[2] Both corpora consist of 500 samples each comprising about 2000 words in the following genres:

More information Label, Text category ...
Label Text category Brown Corpus LOB Corpus
APress: reportage4444
BPress: editorial2727
CPress: reviews1717
DReligion1717
ESkills, trades and hobbies3638
FPopular lore4844
GBelles lettres, biography, essays7577
HMiscellaneous (documents, reports, etc.)3030
JLearned and scientific writings8080
KGeneral fiction2929
LMystery and detective fiction2424
MScience fiction66
NAdventure and western fiction2929
PRomance and love story2929
RHumour99
Total500500
Close

The chief compilers of the LOB corpous were Geoffrey Leech (Lancaster University) and Stig Johansson (University of Oslo); see Leech & Johansson (2009).[3]

The corpus has been also tagged, i.e. part-of-speech categories have been assigned to every word.[1]

References

Wikiwand in your browser!

Seamless Wikipedia browsing. On steroids.

Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.

Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.