Reading: SLP Ch. 10, Parsing with CFGs, sections 1 & 4;
Ch. 12, Lexicalised and Probabilistic Parsing, sections 1, 3, 4;
Allen Ch. 7 section 5
Parsing with Phrase Structure Grammars
Parsing: determining the structure of a piece of
language, usually a sentence, according to a grammar or a
language model.
Most commonly a phrase structure grammar, made up
of rewrite rules which map to trees.
These rules used to be written by hand (remember the
examples in the first lecture). Syntactic parsing was more or
less solved in the classical period, and the grammars written then
form the basis of many which are still used. These days they can
also be learned from tagged corpora.
See SLP 10.1, especially Figs 10.2, 10.3, 10.4 on
basic top-down and bottom-up parsing using PS rules and parse
trees.
``Earley's Algorithm'' or ``chart parsing''
Naive top-down or bottom-up parsing with exhaustive
backtracking takes exponential time in the length of the string. A
dynamic programming approach (remember the Viterbi algorithm) to
efficient parallel top-down search reduces this to polynomial
time, by eliminating the repetitive solution of sub-problems when
backtracking -
where
is the number of words in the
input.
A data structure called a chart holds successfully
parsed constituents within the string. The input is represented as
a graph, where the edges are labelled with the phrase structure
rule which describes the sub-string they span. The right-hand
sides of the rules include a dot showing how many of their constituents
are included in the sub-string.
predicts that a VP will start
after 0, but no input has been recognised yet.
is trying to build an NP
from a Det and a Noun: it has found the Det from 1 to 2 and
predicts that the next word will be a Noun.
has found a complete VP, made
of Verb and a NP, from 0 to 3.
The first two types are active edges: they are
looking for further constituents to complete what they are trying
to build. The third is an inactive edge: it is complete.
The fundamental rule of chart-parsing says, informally,
When an active edge meets an inactive edge of the desired category, add a new edge to the chart spanning both, and label it with the rule on the active edge, with its dot moved along to indicate the constituent which has been incorporated.
The best account of chart parsing is in Gazdar &
Mellish, Natural Language Processing in LISP, Ch. 6.1-6.3.
Probabilistic Parsing
Phrase structure rules can be associated with
probabilities just as POS tags can be. This can help resolve ambiguities in sentence structure, and is an important part of
language modelling for tasks which involve prediction, such
as speech recognition. See the example in SLP 12.1.
The overall probability of a tree for a string can be
computed by multiplying the probabilities of the rules used,
exactly as the probability of a string of categories is computed
by multiplying the bigram probabilities.
Lexicalised Parsing
We saw that bare bigram probabilities gave better
predictive results if combined with lexical probabilities. Here
too, information about individual words complements information
about bare rules. (See SLP 12.3).
``Knowledge about a language is knowledge about the words of that language and how they behave''
This principle is taken to its logical conclusion in Categorial Grammars, which encode the syntactic behaviour of a word directly in its lexical category, and thus do not need grammar rules. They are mentioned briefly in SLP 12.4.