next up previous
Next: Lecture 13 - Corpora Up: CS3421: Natural Language Engineering Previous: Lecture 11 - Efficient

Lecture 12 - Word Sense Disambiguation

Reading: SLP Ch. 16, Lexical Semantics, sections 1 & 2;
Ch. 17, Word Sense Disambiguation & Information Retrieval, sections 1 & 2
http://www.ilc.pi.cnr.it/EAGLES96/rep2/node39.html

Word Senses

A word stem is a base, or root, shared by related word forms.

A word sense is an association of a particular word form with a particular meaning.

Unfortunately, many English words do not show a one-to-one mapping between form and meaning. (We've already seen this with POS tags.) Words with the same form but different meanings are called homonyms. Words with different forms but the same meaning are synonyms. There are other important systematic semantic relations between word senses, which I'll look at when I talk about lexicons.

Word sense ambiguity is very common, and is a major problem for many applications of language engineering. To get a feel for it, here are the (highly edited) results of looking up in WordNet just a the first few words in a random sentence from my ``pet" text (I gave up at this point!). (I'll say more about WordNet, again, when I talk about lexicons. It's linked from the course web page: you might want to try the same thing with your own text.)

``The coat falls smoothly, and is almost maintenance-free: a weekly combing is all that is usually required to keep it in top condition.''

coat - #3 of 3

fall - #25 of 32

smoothly - #1 of 3

maintenance - #1 of 4

free - #1 of 8

The noun ``coat" has 3 senses in WordNet.
1. coat - - (an outer garment that has sleeves and covers the body from shoulder down; worn outdoors)
2. coating, coat - (a thin layer covering something)
3. coat, pelage - (growth of hair or wool or fur covering the body of an animal)

The verb ``fall" has 32 senses in WordNet.
1. fall - (descend in free fall under the influence of gravity)
2. descend, fall, go down, come down - (move downward and lower, but not necessarily all the way)
3. fall - (pass suddenly and passively into a state of body or mind)
4. fall, come - (come under, be classified or included)
5. precipitate, come down, fall - (fall from clouds)
6. fall - (suffer defeat, failure, or ruin)
...
25. hang, fall, flow - (fall or flow in a certain way)
...
32. fall, descend, settle - (come as if by falling)

The adverb ``smoothly" has 3 senses in WordNet.
1. smoothly - (with no problems or difficulties)
2. swimmingly, smoothly - (with great ease and success)
3. smoothly - (in a smooth and diplomatic manner)

Overview for ``maintenance-free"
Sorry, no matches found.

The noun ``maintenance" has 4 senses in WordNet.
1. care, maintenance, upkeep - (activity involved in maintaining something in good working order)
2. maintenance - (means of maintenance of a family or group)
3. alimony, maintenance - (court-ordered support paid by one spouse to another after they are separated)
4. sustenance, sustentation, sustainment, maintenance, upkeep - (the act of sustaining)

The adjective ``free" has 8 senses in WordNet.
1. free (vs. unfree) - (able to act at will; not hampered; not under compulsion or restraint)
2. free (vs. bound) - ((chemistry and physics) unconstrained or not chemically bound in a molecule or not fixed and capable of relatively unrestricted motion)
3. complimentary, costless, free, gratis (predicate), gratuitous - (costing nothing)
4. free - (not occupied or in use)
5. detached, free - (not fixed in position)
6. free (vs. slave) - (not held in servitude)
7. spare, free - (not taken up by scheduled activities)
8. free, loose, liberal - (not literal)

Two things are clear from even this small example:

1) Lots of words have lots of senses.

2) What counts as a separate sense depends, at least in part, on the intuition of the person writing the definitions.

Disambiguation

Hand analysis has shown that typically, even in a small domain-specific corpus, at least 40% of semantically significant words are ambiguous. This has serious implications for applications such as Information Retrieval and Machine Translation.

Not surprisingly, word sense disambiguation is a major research topic in language engineering. SENSEVAL (http://www.itri.bton.ac.uk/events/senseval/) is a series of workshops and competitions, most recently at ACL in July 2002, to discuss and evaluate WSD systems. (More about this when I talk about standards.)

Techniques for WSD:

Knowledge based

Corpus based

Hybrid

Knowledge based WSD: uses an explicit lexicon (machine readable dictionary (MRD) or thesaurus) or knowledge base (WordNet, LDOCE (Longman Dictionary of Contemporary English), UMLS (Unified Medical Language System)).

Fairly accurate, especially in specialised domains

Expensive and does not generalise

Corpus based WSD:

Supervised learning, from a tagged / disambiguated corpus: some learning ability, but again the resources are expensive and hard to get hold of.

Supervised learning, from an artificial corpus: notably parallel bilingual corpora, where the sense of an ambiguous word in one language can be identified from its tranlation equivalent in the other.

Unsupervised learning, from a raw corpus: completely general, but not very effective. What can be learned is word sense discrimination: you can group together instances of a word being used in different senses without knowing what those senses are. For some purposes, that's enough.

Hybrid WSD: as so often, hybrid approaches using the best of both often perform best. For example Yarowsky (1995) used a small set of definitions as ``seeds" to classify the simple cases in a corpus and ``grew" outward from there to increasingly larger lists which were re-applied to the corpus, giving 96% success.


next up previous
Next: Lecture 13 - Corpora Up: CS3421: Natural Language Engineering Previous: Lecture 11 - Efficient
Mary McGee Wood 2002-12-10