1. References
Introductory reading:
Clocksin, W. F and C. S. Mellish (1981) Programming in Prolog.
SpringerVerlag. Ch. 9.
Clocksin and Mellish's Ch. 9 grammar is in the file sentence_grammar.pl.
Further reading:
Pereira, F. C. N. and S. M. Shieber (1987) Prolog and Natural
Language Analysis. CSLI Publications. Sections 2.7 (pp. 29-36),
3.4.2 (pp. 612), 3.7 (pp. 70-79).
Gazdar, G. and C. Mellish, Natural Language Processing in Prolog.
Chapters 4 and 5.
2. Intuitive Parsing
Step 1. | The string: | the | quick | brown | fox | jumps | over | the | lazy | dogs |
Step 2. | Tagging: | DET | ADJ | ADJ | N | V | P | DET | ADJ | N |
| | | | | | | | |||||||
Step 3. | Project N' heads: | \___ | \___ | N' | | | | | \___ | N' | ||
Step 4. | Add N' modifiers: | | | | | | | | | |||||
Step 5. | Project NP: | \_________________ | NP | | | | | \________ | NP | |||
Step 6. | Add NP specifiers: | | | | | | | ||||||
Step 7. | Project PP: | | | | | PP | ___________________/ | |||||
Step 8. | Project VP: | | | VP | ___/ | ||||||
\ | / | |||||||||
Step 9. | S: | S |
The tree is constructed from frontier to root (bottom-up), as single words are grouped into phrases, phrases into clauses etc.
3. Real parsing 1. (Topdown) recursive descent parsing
Start symbol: S
Input string: "the quick brown fox jumps over the lazy dogs"
Rules:
1) S → NP VP
2) NP → (DET) ADJ* N
3) VP → V NP
4) VP → V PP
5) DET → the; a; an ...
6) N → dogs; fox; jumps ...
7) ADJ → quick; brown; lazy ...
8) V → jumps; runs ...
9) P → over; onto; in ; under ...
10) PP → P NP
Initial state 1:
S | Stack ("to do" list): | S | ||
↙ | ⇣ | ↘ | ||
? | ? | ? | ||
the quick | brown | ... |
"Reach down" from the start symbol towards the string. I.e. "how would I
generate this string?"
The only way of getting "down" from S is via rule 1: S → NP VP. So build a
little bit of structure and put NP and VP on the stack (the list of
symbols remaining to be dealt with).
State of play 2:
S | Stack: | NP | ||||
/ | \ | VP | ||||
NP | VP | |||||
↙ | ⇣ | ⇣ | ⇣ | ⇣ | ||
? |
? |
? |
? | ? | ||
the quick |
brown | ... |
← | The part of the string that remains to be parsed is called the remainder |
Next step 3: expand the leftmost unexpanded daughter first.
(I.e. NP). NP → (DET) ADJ* N
S | Stack: |
(DET) | ||||
/ | \ |
|
ADJ* |
|||
NP | VP | N |
||||
/ | | | \ | ⇣ | ⇣ | VP | |
(DET) | ADJ* | N | ? | ? | ||
⇣ | ⇣ | ⇣ | ||||
? |
? |
? |
||||
|
Remainder: the quick brown ... |
Expand leftmost unexpanded daughter: DET → the; a; an
...
As this is a preterminal rule, we require that one of the terminals on the
right hand side is a prefix (= an initial substring) of the
remainder of the analysis string for this rule to be applicable. This
condition is met in this case, as "the" is a prefix of the remainder.
State of play 4:
S | Stack: |
ADJ* | ||||
/ | \ |
|
N | |||
NP | VP | VP | ||||
/ |
| |
\ |
⇣ | ⇣ | ||
(DET) |
ADJ* |
N |
? | ? | ||
| |
⇣⇣ | ⇣ | ||||
the |
? |
? |
||||
|
Remainder: quick brown fox ... |
Expand ADJ*. "quick" and "brown" are ADJ's, so they can be included in
the NP.
State of play 5:
S | Stack: |
N | ||||
/ | \ |
|
VP | |||
NP | VP | |
||||
/ |
| |
\ |
⇣ | ⇣ | ||
(DET) |
ADJ* |
N |
? | ? | ||
| |
/ \ | ⇣ | ||||
the |
quick brown |
|||||
|
Remainder: fox jumps over ... |
Expand top of the stack (leftmost unexpanded daughter).
First, N. (Rule 6 N → dogs; fox ...)
Then, VP. (Rule 3 VP → V NP)
Then, V. (Rule 8 V → jumps; runs ...)
Then, NP. (Rule 2 NP → (DET) ADJ* N)
At this stage, the state of play is:
S | Stack: |
(DET) | ||||||
/ | \ |
|
ADJ* | |||||
NP | VP | N |
||||||
/ |
| |
\ |
/ | \ | ||||
DET |
ADJ* |
N |
V | NP | ||||
| |
/ \ | | | | | / |
| | \ | ||
the |
quick brown |
fox |
jumps | (DET) |
ADJ* | N | ||
|
Remainder: over the lazy dogs |
Rules 5 and 7 are both preterminal, but neither of them introduces a
prefix of "over the lazy dogs" So, we must try other expansions of the
most recently expanded nonterminal (backtracking). But there are no other
expansions of NP, so we must backtrack again, to the VP node. An
alternative expansion of VP is rule 4 (VP → V PP). The parse continues,
and eventually all of the material will be included in the parse. When no
more of the string is left, and there are no more categories left of the
stack to deal with, the parse is complete.
4. The simplest parsing program: a Prolog DCG (Definite Clause
Grammar) (download)
/*
DOG_GRAMMAR.PL */
s --> np, vp.
np --> n. np --> adj,
n. np --> adj, adj, n.
np --> det, n. np --> det, adj,
n. np --> det, adj, adj, n.
vp --> v, np. vp --> v, pp.
pp --> p, np.
det --> [the]. det --> [a].
det --> [an].
n --> [dogs]. n --> [fox]. n
--> [jumps].
adj --> [quick]. adj -->
[brown]. adj --> [lazy].
v --> [jumps]. v --> [runs].
p --> [over]. p --> [onto].
p --> [in]. p --> [under].
/* Generate all sentences */
loop:- s(S,[]), write(S), nl, fail.
5. Difference lists
These Prolog grammars employ difference list notation for strings of words.
([the,quick,brown,fox],[quick,brown,fox]) indirectly indicates the single word [the], with [quick,brown,fox] left as a remainder.
([a,fish,swims],[]), with an empty list as its remainder, indicates the string [a,fish,swims]
This is a bit baffling to explain, but becomes easy enough once you use trace. to
watch how the parser works, step by step.
At the Prolog ?-
prompt, type
[dog_grammar].
to load and compile dog_grammar.pl
Prolog replies:
dog_grammar
consulted
yes
?-
At the Prolog prompt, try any of the following queries:
?-
s([the,quick,brown,fox,jumps,over,the,lazy,dogs],[]).
?- s(X,[]).
(Type semicolon-return after the reply to generate additional answers.)
?-
s([the,X,jumps,over,the,Y],[]).
(Note that "jumps" is
listed as both a verb and a noun.)
Other grammars to consult include: sentence_grammar.pl,
syllable_grammar.pl
Try:
?-
syllable_sequence([dh,@,k,w,i,k,b,r,a,w,n,f,o,k,s],[]).
More on phonological parsing:
Church, K. W. (1983) Phrase Structure Parsing: A Method for Taking
Advantage of Allophonic Constraints. Ph. D. thesis, M. I. T.
Distributed by IULC, and also published by Kluwer.
Randolph, M. A. (1989) Syllable-based Constraints on Properties of
English Sounds. Ph.D. thesis, M. I. T.
Dirksen, A. (1993) Phrase Structure Phonology. In Ellison, T. M. and J. M.
Scobbie, eds. (1993) Computational Phonology. Edinburgh Working
Papers in Cognitive Science 8.
Coleman, J. (1993) English word-stress in Unification-based Grammar. In
Ellison and Scobbie, eds.