A zero-tolerance approach to PP attachment

« previous post | next post »

Deborah Ball, "Pope Francis Appoints Eight to Sex-Abuse Commission", WSJ 3/22/2014:

Pope Francis on Saturday appointed a victim of sexual abuse and a senior cardinal known for his zero-tolerance approach to a new group charged with advising the Catholic Church on how to respond to the problem of sexual abuse of children.

The sequence "zero-tolerance approach to a new group" sent Tim Leonard down a syntactic garden path — he had to get past "charged with advising the Catholic Church" before he figured out that the cardinal was appointed to the new group rather than having a zero-tolerance approach to it. So Tim forwarded the example to me, and I had exactly the same experience.

But garden-path experiences of this kind are rare. When we read what others have written, or listen to what others say, we generally fit everything together just as the writer or speaker intended, despite many missed opportunities to get it wrong.  In fact, it's usually hard to see just how many opportunities there are to misconstrue any given sentence.

One good way to learn about unsuspected ways to misconstrue text, unfortunately, is to look at the output of even the best current parsing algorithms. Thus the Berkeley Parser'output for this sentence makes the same mistake that Tim and I did, but it also misconstrues  two other prepositional phrases whose ambiguity we never noticed. The sentence's nine prepositional heads are in boldface, and the ones that the parser wrongly attached are in red:

Pope Francis on Saturday appointed a victim of sexual abuse and a senior cardinal known for his zero-tolerance approach to a new group charged with advising the Catholic Church on how to respond to the problem of sexual abuse of children.

As you should be able to tell by inspecting the pretty-printed parser output:

  1. the prepositional phrase "on Saturday" is construed as a post-modifier of "Pope Francis" rather than as an pre-modifier of "appointed";
  2. the prepositional phrase "to a new group charged with advising the Catholic Church on how to respond to the problem of sexual abuse of children" is attached to "zero tolerance approach" ("known for his zero tolerance approach to a new group …") rather than to "appointed" ("appointed to a new group …");
  3. the prepositional phrase "on how to respond to the problem of sexual abuse of children" is construed as modifying "the Catholic Church" (as if it were "the church on the hill") rather than as a complement of the verb "advising" ("advising X on how to respond");

The  Stanford parser has the same pattern of PP attachment errors on this particular sentence.

Why didn't Tim and I make the other two PP-attachment errors that the parsers made? And why did the parsers get six of the attachments right?

We can get a clue by looking at the sentence's first PP: "on Saturday". The right thing to do is to construe this as a adverbial adjunct to the verb phrase "appointed …" Tim and I had no problem with this, but the parsers both concluded that "on Saturday" was a post-modifier of "Pope Francis", as in "the temperature on Thursday" or "lunch on Friday".

Tim and I recognized that in a news-media sentence beginning, schematically, "NOUNPHRASE on WEEKDAY VERBed", the PP "on Weekday' is likely to be a time adverbial rather than a post-nominal modifier, and is almost certain to be a time adverbial if the NP denotes a person. We can derive this from our knowledge that people are relatively unlikely to get temporal modifiers of whatever kind — "Ruby Tuesday" notwithstanding — or we can just count instances and calculate the odds. Thus looking for examples in COCA we find e.g.

Commissioners on Tuesday approved a $1.7 million contract
U.N . diplomats on Sunday began drafting a new resolution
Williams on Friday denied trying to short-circuit the investigation
Continental on Friday raised its estimate by a nickel a gallon

In contrast, when subject NPs denote events or properties of events, things are often  different:

Thunderstorms on Sunday have stranded hundreds of passengers
Lunch on Saturday is an additional $8
Attendance on Sunday morning may be about half that on a good day
Reports on Friday said the pair are known to have connections with Rome's other Serie A club

Parsers could count  such instances and calculate the odds as well — that's basically how parsers work these days — but  this is apparently not yet one of the things that those two parsers count. (Though given that "on WEEKDAY" occurs at a rate of about 218 per million words in newspaper text, this case is not all that far out on the Zipfian tail…)

Similar remarks apply to most other typical parser errors. Scope of conjunction is another difficult issue — there's just one conjunction in this sentence, and the Berkeley parser gets it wrong: "a senior cardinal" is conjoined with "sexual abuse" rather than with "a victim of sexual abuse", so that the appointee is identified as  "a victim of [sexual abuse and a senior cardinal]".

It's usually the case, as here, that the optimal structural and semantic parallelism yields the correct scope of conjunction: victim and cardinal are both human, both are associated with an indefinite article, etc., whereas "sexual abuse and a senior cardinal" is unnecessarily sylleptic.  People are pretty good at performing this sort of optimization, whereas today's parsers generally don't even try. Thus both the Berkeley and Stanford parsers do the same wrong unparallel thing with "Every day of the week and time of the day":

Berkeley
Stanford

This structure is so counter-intuitive that it's hard to grasp what it might mean. Here it is as a tree structure:

A word-sequence for which this structure might be plausible:

Every picture of [[ a dog and cats ] in a box]

Don't ask what "[[ the week and time] of the day]" might mean, because it's beyond me.

Still, parsers are enormously better now than they were just a few years ago. And the increasing interest in their practical applications means that performance within a decade or so should approach human levels.

Meanwhile, we have an excellent source of psycholinguistic puzzles in the parsing errors that humans have a hard time even understanding.



22 Comments

  1. David said,

    March 26, 2014 @ 1:53 pm

    Can I be Pope Francis on Friday?

  2. Jim said,

    March 26, 2014 @ 1:55 pm

    David, sorry, I have that booked. Need a papal indulgence to go to a steakhouse that night.

  3. Dan Milton said,

    March 26, 2014 @ 3:24 pm

    I didn't see the problem with the paragraph until I went on to your comments. Are linguists who analyze as they read more likely to be led down the garden path than ordinary readers?

  4. Phil Jennings said,

    March 26, 2014 @ 3:56 pm

    The example is a sentence more likely to be written than spoken, and as such begs for a couple commas. I'd put one after 'abuse' and the second after 'approach.' I'd do this as a help to the reader, who kindly would not belittle me for being unable to explain myself. Perhaps the original writer was chided in the past and has become comma-shy.

  5. errorr said,

    March 26, 2014 @ 4:13 pm

    I didn't have a problem with this one. Although I would have added a comma or three. Although I have been accused in the past of overusing commas and either use too few or too many.

  6. Neal Goldfarb said,

    March 26, 2014 @ 5:25 pm

    @myl:

    "a senior cardinal" is conjoined with "sexual abuse" rather than with "a victim of sexual abuse", so that the appointee is identified as "a victim of [sexual abuse and a senior cardinal]".

    Actually, it works pretty well that way, too

  7. Chris Waters said,

    March 26, 2014 @ 5:58 pm

    Dan Milton: I'm not a linguist, and I went straight down the same garden path as Prof. L.

  8. David Morris said,

    March 26, 2014 @ 7:16 pm

    Is this the first paragraph of the WSJ article? A lot could be avoided by writing: 'On Saturday, Pope Francis appointed eight members of a new group charged with advising the Catholic Church on how to respond to the problem of sexual abuse of children. Among those appointed [or Among the appointees] are a victim of sexual abuse and a senior cardinal known for his zero-tolerance approach to sexual abuse.'

  9. Jenny Tsu said,

    March 26, 2014 @ 10:58 pm

    @David Morris – the original sentence is a classic illustration of "Just because it's possible doesn't mean it's a good idea!"

  10. dainichi said,

    March 26, 2014 @ 11:43 pm

    Maybe the parsers got number 1 wrong because they, like me, thought that "On WEEKDAY, NP VERBed" or "NP VERBed on WEEKDAY" sound good, but "NP on WEEKDAY VERBed" sounds unnatural.

  11. Chris Waters said,

    March 26, 2014 @ 11:47 pm

    I think part of the reason it's so garden-pathy is that "approach to" is such a common phrase. While I'm not a linguist, I have worked on NLP software (for a largish Internet Search company*), and that's exactly the kind of cue that can really help the parser. Thus, even though in this case, it would actually lead to an incorrect parsing, I'd actually be pleased and impressed if the computer misinterpreted this for the right reasons!

    *One whose mascot was from Wodehouse.

  12. RP said,

    March 27, 2014 @ 3:35 am

    @dainichi,
    To me, too, it sounds very unnatural to have the weekday in second position – but I'm a BrE speaker, and I see the weekday-second pattern a lot in US journalism, so it could be a BrE/AmE difference.

  13. Robert Kenney said,

    March 27, 2014 @ 7:40 am

    I had no problem reading this as intended by the writer, probably because there was no immediate prepositional phrase following "appointed a victim of sexual abuse…" It comes only after the mention of the cardinal.

  14. Vasha said,

    March 27, 2014 @ 7:59 am

    No, the weekday-second ordering really is unnatural to (this) American speaker. Presumably journalists don't put it ahead of the sentence because they want the first, eye-catching words to be the much-more-significant agent, and the position after the verb may be hindered by all the other complex sentence elements there.

  15. Ellen K. said,

    March 27, 2014 @ 9:10 am

    I only got as far as "a new group" before being able to get the right parsing, though that was due to careful reading because of familiarity with, and interest in, the issue as well as due to reading it here, on Language Log. Knowing there's something of linguistic interest in a passage does change how I read it.

  16. V said,

    March 27, 2014 @ 10:57 am

    Dan: I'm not a linguist and I parsed it the same way.

  17. exackerly said,

    March 27, 2014 @ 3:12 pm

    @Vasha, in my 8th-grade journalism class I was taught that the opening paragraph tells Who, What, Where, Why, and When, but you never lead with When.

  18. Ted said,

    March 27, 2014 @ 4:37 pm

    This makes me wonder about the extent to which an algorithm can successfully process this using purely grammatical knowledge. My first reaction to myl's question about why we don't misconstrue "Pope Francis on Saturday appointed" was that the semantic elements were determinative: I know extrinsically that papacy is generally not a day-long condition, and more precisely that Pope Francis is a specific individual whose identity does not change depending on the day.

    I therefore would have suggested that, when I hear "The Pope on Saturday" at the beginning of a sentence, the reason I don't parse "on Saturday" as modifying "the Pope" is because it's unlikely that such a precise temporal modifier would be necessary or helpful as a dependent modifier of the head noun "Pope" (and even less likely with the arthrous version, "the Pope"), given my knowledge that there's only one Pope at a time and the incumbent generally retains the title for life. Thus, hearing "on Saturday" and knowing that it's unlikely to modify "the Pope," I anticipate that it will be followed by something where the modifier "on Saturday" is likely to provide useful information, and sure enough, there's a preterit form of an verb denoting an instantaneous action, "appointed." There's no garden path, because the meaning of the words doesn't readily allow for ambiguous interpretations.

    A fortiori, this analysis applies to a sentence that begins "Pope Francis on Saturday," because it is impossible, given the meaning of "Pope Francis," for that phrase to be modified in any useful way by "on Saturday."

    I find in interesting, therefore, that myl's analysis seems to be based on Saturday being classifiable as WEEKDAY — which I take to mean something like

    Saturday ∈ {Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday}

    rather than a human parser's analysis (mine, anyway), which I think is more like

    "On Saturday" only makes sense if it's modifying something where the specific day on which something occurs (or some state of affairs exists) is relevant, and that condition is not met with respect to "Pope Francis."

  19. Ted said,

    March 27, 2014 @ 5:04 pm

    Sorry, didn't quite finish that last thought. What I mean is that myl's analysis seems to be something like "we don't go down the garden path because utterances of the type WEEKDAY + VERB are statistically more likely than utterances of the type NOUN + WEEKDAY," rather than what I think is actually happening, which is that we don't go down the garden path because we know that a modifier that refers to a specific day is likely to provide useful information if it's modifying a non-continuous action like appointing, but not if it's modifying a perpetual status like being Pope Francis.

  20. Eric said,

    March 28, 2014 @ 5:34 am

    Am I the only one who at first read "a victim of sexual abuse and a senior cardinal" as referring to a single person?
    As in 'Pope Francis appointed a cardinal who was also a victim of sexual abuse'.

    [English is not my native language.]

  21. Kyle Gorman said,

    March 28, 2014 @ 5:01 pm

    I just ran the sentence in question through the BUBS parser (which is comparable to the Berkeley parser but includes more pruning of grammar and search space) and a WSJ-based grammar. While the "to a new group…" PP is still attached too low, and the Pope is still just Pope for a day, it does correctly conjoin "a victim of sexual abuse" and "a senior cardinal".

  22. chris said,

    March 28, 2014 @ 8:22 pm

    Am I the only one who at first read "a victim of sexual abuse and a senior cardinal" as referring to a single person?

    I thought it was ambiguous in that regard.

    My first reaction to myl's question about why we don't misconstrue "Pope Francis on Saturday appointed" was that the semantic elements were determinative: I know extrinsically that papacy is generally not a day-long condition, and more precisely that Pope Francis is a specific individual whose identity does not change depending on the day.

    Yeah, I agree with this too. Because the reader's understanding of the semantics informs how they interpret the syntax, computer parsers won't achieve parity with human parsers until they *understand* the sentences in addition to analyzing them.

RSS feed for comments on this post