Reading: SLP Ch. 16, Lexical Semantics, sections 1 & 2;
Ch. 17, Word Sense Disambiguation & Information Retrieval, sections 1 & 2
http://www.ilc.pi.cnr.it/EAGLES96/rep2/node39.html
Word Senses
A word stem is a base, or root, shared by related word forms.
A word sense is an association of a particular word form with
a particular meaning.
Unfortunately, many English words do not show a
one-to-one mapping between form and meaning. (We've already seen
this with POS tags.) Words with the same form but different
meanings are called homonyms. Words with different forms but
the same meaning are synonyms. There are other important
systematic semantic relations between word senses, which I'll look
at when I talk about lexicons.
Word sense ambiguity is very common, and is a major problem for many applications of language engineering. To get a feel for it, here are the (highly edited) results of looking up in WordNet just a the first few words in a random sentence from my ``pet" text (I gave up at this point!). (I'll say more about WordNet, again, when I talk about lexicons. It's linked from the course web page: you might want to try the same thing with your own text.)
``The coat falls smoothly, and is almost
maintenance-free: a weekly
combing is all that is usually required to keep it in top condition.''
coat - #3 of 3
fall - #25 of 32
smoothly - #1 of 3
maintenance - #1 of 4
free - #1 of 8
The noun ``coat" has 3 senses in WordNet.
1. coat - - (an outer garment that has sleeves and covers the body from shoulder down; worn outdoors)
2. coating, coat - (a thin layer covering something)
3. coat, pelage - (growth of hair or wool or fur covering the body of an animal)
The verb ``fall" has 32 senses in WordNet.
1. fall - (descend in free fall under the influence of gravity)
2. descend, fall, go down, come down - (move downward and lower, but not necessarily all the way)
3. fall - (pass suddenly and passively into a state of body or mind)
4. fall, come - (come under, be classified or included)
5. precipitate, come down, fall - (fall from clouds)
6. fall - (suffer defeat, failure, or ruin)
...
25. hang, fall, flow - (fall or flow in a certain way)
...
32. fall, descend, settle - (come as if by falling)
The adverb ``smoothly" has 3 senses in WordNet.
1. smoothly - (with no problems or difficulties)
2. swimmingly, smoothly - (with great ease and success)
3. smoothly - (in a smooth and diplomatic manner)
Overview for ``maintenance-free"
Sorry, no matches found.
The noun ``maintenance" has 4 senses in WordNet.
1. care, maintenance, upkeep - (activity involved in maintaining something in good working order)
2. maintenance - (means of maintenance of a family or group)
3. alimony, maintenance - (court-ordered support paid by one spouse to another after they are separated)
4. sustenance, sustentation, sustainment, maintenance, upkeep - (the act of sustaining)
The adjective ``free" has 8 senses in WordNet.
1. free (vs. unfree) - (able to act at will; not hampered; not under compulsion or restraint)
2. free (vs. bound) - ((chemistry and physics) unconstrained or not chemically bound in a molecule or not fixed and capable of relatively unrestricted motion)
3. complimentary, costless, free, gratis (predicate), gratuitous - (costing nothing)
4. free - (not occupied or in use)
5. detached, free - (not fixed in position)
6. free (vs. slave) - (not held in servitude)
7. spare, free - (not taken up by scheduled activities)
8. free, loose, liberal - (not literal)
Two things are clear from even this small example:
1) Lots of words have lots of senses.
2) What counts as a separate sense depends, at least in part, on
the intuition of the person writing the definitions.
Disambiguation
Hand analysis has shown that typically, even in a small domain-specific
corpus, at least 40% of semantically significant words are ambiguous.
This has serious implications for applications such as Information
Retrieval and Machine Translation.
Not surprisingly, word sense disambiguation is a major
research topic in language engineering. SENSEVAL (http://www.itri.bton.ac.uk/events/senseval/) is a series of
workshops and competitions, most recently at ACL in July 2002,
to discuss and evaluate WSD systems. (More about this when I talk
about standards.)
Techniques for WSD:
Knowledge based WSD: uses an explicit lexicon (machine readable dictionary (MRD) or thesaurus) or knowledge base (WordNet, LDOCE (Longman Dictionary of Contemporary English), UMLS (Unified Medical Language System)).
Fairly accurate, especially in specialised domains
Expensive and does not generalise
Corpus based WSD:
Supervised learning, from a tagged / disambiguated corpus: some learning ability, but again the resources are expensive and hard to get hold of.
Supervised learning, from an artificial corpus: notably parallel bilingual corpora, where the sense of an ambiguous word in one language can be identified from its tranlation equivalent in the other.
Unsupervised learning, from a raw corpus: completely general, but
not very effective. What can be learned is word sense
discrimination: you can group together instances of a word
being used in different senses without knowing what those senses
are. For some purposes, that's enough.
Hybrid WSD: as so often, hybrid approaches using the best of
both often perform best. For example Yarowsky (1995) used a small
set of definitions as ``seeds" to classify the simple cases in a
corpus and ``grew" outward from there to increasingly larger lists
which were re-applied to the corpus, giving 96% success.