Clocksin, W. F and C. S. Mellish (1981) Programming in Prolog. SpringerVerlag. Ch. 9.
Pereira, F. C. N. and S. M. Shieber (1987) Prolog and NaturalLanguage Analysis. CSLI Publications. Sections 2.7 (pp. 29-36), 3.4.2 (pp. 612), 3.7 (pp. 70-79).
/guest/seminars/chaptr04.doc, /guest/seminars/chaptr05.doc (Extracts from the book "Natural Language Processing in Prolog" by G. Gazdar and C. Mellish).
Clocksin and Mellish's Ch. 9 grammar is in the file sentence_grammar.pl.
2. Intuitive Parsing
|Step 1.||The string:||the||quick||brown||fox||jumps||over||the||lazy||dogs|
|Step 3.||Project N' heads:||\___||\___||N'||||||||\___||N'|
|Step 4.||Add N' modifiers:|||||||||||||
|Step 5.||Project NP:||\_________________||NP||||||||\________||NP|
|Step 6.||Add NP specifiers:||||||||||
|Step 7.||Project PP:||||||||PP||___________________/|
|Step 8.||Project VP:|||||VP||___/|
The tree is constructed from frontier to root (bottom-up), as single words are grouped into phrases, phrases into clauses etc.
3. Real parsing 1. (Topdown) recursive descent parsing
Start symbol: S
Input string: "the quick brown fox jumps over the lazy dogs"
1) S → NP VP
2) NP → (DET) ADJ* N
3) VP → V NP
4) VP → V PP
5) DET → the; a; an ...
6) N → dogs; fox; jumps ...
7) ADJ → quick; brown; lazy ...
8) V → jumps; runs ...
9) P → over; onto; in ; under ...
10) PP → P NP
Initial state 1:
|S||Stack ("to do" list):||S|
"Reach down" from the start symbol towards the string. I.e. "how would I
generate this string?"
The only way of getting "down" from S is via rule 1: S → NP VP. So build a little bit of structure and put NP and VP on the stack (the list of symbols remaining to be dealt with).
State of play 2:
| the quick
||←||The part of the string that remains
to be parsed is called the remainder
Next step 3: expand the leftmost unexpanded daughter first. (I.e. NP). NP → (DET) ADJ* N
||Remainder: the quick brown ...
Expand leftmost unexpanded daughter: DET → the; a; an
As this is a preterminal rule, we require that one of the terminals on the right hand side is a prefix (= an initial substring) of the remainder of the analysis string for this rule to be applicable. This condition is met in this case, as "the" is a prefix of the remainder.
State of play 4:
||Remainder: quick brown fox ...
Expand ADJ*. "quick" and "brown" are ADJ's, so they can be included in
State of play 5:
||Remainder: fox jumps over ...
Expand top of the stack (leftmost unexpanded daughter).
First, N. (Rule 6 N → dogs; fox ...)
Then, VP. (Rule 3 VP → V NP)
Then, V. (Rule 8 V → jumps; runs ...)
Then, NP. (Rule 2 NP → (DET) ADJ* N)
At this stage, the state of play is:
||Remainder: over the lazy dogs
Rules 5 and 7 are both preterminal, but neither of them introduces a prefix of "over the lazy dogs" So, we must try other expansions of the most recently expanded nonterminal (backtracking). But there are no other expansions of NP, so we must backtrack again, to the VP node. An alternative expansion of VP is rule 4 (VP → V PP). The parse continues, and eventually all of the material will be included in the parse. When no more of the string is left, and there are no more categories left of the stack to deal with, the parse is complete.
4. The simplest parsing program: a Prolog DCG (Definite Clause Grammar)
/* DOG_GRAMMAR.PL */
s --> np, vp.
np --> n. np --> adj, n. np --> adj, adj, n.
np --> det, n. np --> det, adj, n. np --> det, adj, adj, n.
vp --> v, np. vp --> v, pp.
pp --> p, np.
det --> [the]. det --> [a]. det --> [an].
n --> [dogs]. n --> [fox]. n --> [jumps].
adj --> [quick]. adj --> [brown]. adj --> [lazy].
v --> [jumps]. v --> [runs].
p --> [over]. p --> [onto].
p --> [in]. p --> [under].
/* Generate all sentences */
loop:- s(S,), write(S), nl, fail.
5. Difference lists
These Prolog grammars employ difference list notation for strings of words.
([the,quick,brown,fox],[quick,brown,fox]) indirectly indicates the single word [the], with [quick,brown,fox] left as a remainder.
([a,fish,swims],), with an empty list as its remainder, indicates the string [a,fish,swims]
This is a bit baffling to explain, but becomes easy enough once you use trace. to
watch how the parser works, step by step.
At the Prolog ?- prompt, type
to load and compile dog_grammar.pl
At the Prolog prompt, try any of the following queries:
(Type semicolon-return after the reply to generate additional answers.)
(Note that "jumps" is listed as both a verb and a noun.)
Other grammars to consult include: sentence_grammar.pl, syllable_grammar.pl
More on phonological parsing:
Church, K. W. (1983) Phrase Structure Parsing: A Method for Taking Advantage of Allophonic Constraints. Ph. D. thesis, M. I. T. Distributed by IULC, and also published by Kluwer.
Randolph, M. A. (1989) Syllable-based Constraints on Properties of English Sounds. Ph. D. thesis, M. I. T.
Dirksen, A. (1993) Phrase Structure Phonology. In Ellison and Scobbie, eds. (Reference below.)
Coleman, J. (1993) English word-stress in Unification-based Grammar. In Ellison and Scobbie, eds.
Ellison, T. M. and J. M. Scobbie, eds. (1993) Computational Phonology. Edinburgh Working Papers in Cognitive Science 8.