Heuristics for efficient treebanking
Contents
- Heuristics for efficient treebanking
- Technical choices
- Notes from Tomar meeting
Top-down
- Choose the construction that spans the whole sentence
- Typically SUBJH
- Typically not one of the FRAG* rules
Bottom-up
- Disambiguate lexical entries early, to reduce remaining ambiguity
Prefer Simpler
- in general prefer the simpler choice
- e.g. for nominal seating prefer NP over intransitive V, rather than NP over transitive V with an optional complement.
Technical choices
Complex proper names
Titles
- |Mr. Browne|
- Choose NP-TITLE-CMPND, not APPOS
Capitalized words in name
- treat as parts of name, not ordinary words
- |Rolls-Royce Motor Cars Inc.|
- |Motor Cars|
- NP_NAME_CMPND, not NOUN_N_CMPND
- |Rolls-Royce|
- Choose multi-word entry when available
- |Rolls-Royce Motor Cars|
- NP_NAME_CMPND
- Attach |Inc.| with NADJ_RR
- |Motor Cars|
Profession modifier
- treat as appositive
- |Howard Mosher, president and CEO|
- First combine |Howard Mosher|
- Then combine it with |president and CEO| using APPOS_NBAR
Native names preferred when available
- Company names
- |Rolls-Royce|
- Choose n_-_pn_le, not NP_NAME_CMPND
- |Rolls-Royce|
- Country names
- |U.S.|
- Choose n_-_c-nm-pd_le, not n_-_pn-gen_le
- |U.S.|
Proper names and punctuation
- Unknown names
- |Elianti.|
- Choose PUNCT_PERIOD_ORULE (period is not part of name)
- |Elianti.|
- Name abbreviations containing periods
- |U.S.|
- Choose PUNCT_PERIOD_ORULE if word is at end of sentence
- |U.S.|
PP/modifier attachment
- Choose highest attachment point consistent with meaning
- |remain steady at 1,200 cars|
- attach to VP, not to |steady|
- |reserve a room for Browne|
- attach to VP, not to |room|
-
but disprefer modifier attachment to semantically vacuous heads
- e.g. attach modifiers to hiring …, not be hiring …
- |remain steady at 1,200 cars|
- In copula constructions (with forms of verb “be”), attach PP inside
- |be payable Feb. 15|
- First combine |payable| with |Feb. 15| with HADJ_I_UNS
- |be payable Feb. 15|
- Complement vs. modifier - choose complement when available
- |based in Los Angeles|
- Choose HCOMP, not HADJ_I_UNS
- |based in Los Angeles|
- PP modifier inserted between verb and its complement NP
- |publish in statements the names of insiders|
- First combine |publish| with |in statements| using VMOD_I
- |publish in statements the names of insiders|
Temporal modifiers
- When precede VP, attach to subject NP
- |the maker last year sold cars|
- attach |last year| to |maker|
- |the maker last year sold cars|
- Treat as modifiers, pumping temporal NP to a PP
- |last year|
- Choose NPADV, not ADJN
- |Feb. 15|
- Combine with HSPECHC, then choose NPADV
- |last year|
- Complex phrases
- |early next year|
- Combine |early| with |next year| using NADJ_RR
- |early next year|
Complex compound nouns
- Choose bracketing with intended sense
- |luxury auto maker|
- first combine |luxury| with |auto|
- |luxury auto maker|
- When intended bracketing is not clear, group from right to left
- |airline ticket counter|
- first combine |ticket| with |counter|
- |airline ticket counter|
Coordination
-
if you have a choice between XP CCONJ XPvs X CCONJ X choose the XP (or S), that is, the highest constituent
- e.g., for cats and dogs, prefer NP coordination over N coordination with a bare NP rule on top
- Nominal phrases
- Choose N_COORD_TOP_2, not N_COORD_TOP_3 when given the choice
- Sentence-initial conjunction - treat as incomplete coordination of
clauses
- |But Abrams arrived early.|
- Combine |But| with |Abrams arrived early.| with HMARK_CL
- |But Abrams arrived early.|
Passive verb vs. adjective
- Choose verb if the meaning is agentive; otherwise choose adjective
- |A date hasn’t been set|
- For |set|, choose v_np*_le, not aj_-_i_le
- |A date hasn’t been set|
Punctuation
- Attach punctuation to the preceding words
- except for some rare conjunctions
- Paired commas marking off a modifier: choose “paired” rule (-PR
suffix)
- |Bell, based in Los Angeles|
- Choose NADJ_RC_PR to combine modifier phrase with |Bell|
- |Bell, based in Los Angeles|
Adverbs
- Negation - always attach |not| to preceding auxiliary if possible
- |did not meet|
- First combine |did| with |not| using HCOMP
- |did not meet|
- Other adverbs between auxiliary and main VP - attach adverb to
following VP
- |can really sing|
- First combine |really| with |sing| using ADJH_S
- |can really sing|
- Sentence-initial - Prefer attachment without extraction when
possible
- |Apparently the commission met|
- Choose ADJ_S, not FILLHEAD_NON_WH_IG
- |Apparently the commission met|
Measure phrases
- Degree modifiers - combine with the number word
- |about 25 % of them|
- First combine |about| with |25| using HSPECHC
- Combine |%| with |of them| using HCOMP
- |about 25 % of them|
- Dollar amounts - treat the symbol |$| as the head (the unit of
measure)
- |$ 80 billion|
- Combine |$| with |80 billion| using MEAS_NP_SYMB
- |$ 80 billion|
Quotations with explicit attribution
- treat as extraction from ‘saying’ verb
- |They arrived, Browne said.|
- Combine |They arrived,| with |Browne said.| using FILLHEAD_NON_WH
- |They arrived, Browne said.|
Partitive NPs
- First pump determiner to noun, and treat of-PP as complement
- |some of the books|
- Combine |some| with |of the books| using HCOMP
- |some of the books|
- For |all|, |not all|, |both|, and |half|, treat following NP
as complement
- |not all those who wrote|
- For |not all|, choose native entry n_np_mc-neg_le
- Combine |not all| with |those who wrote| using HCOMP
- |not all those who wrote|
Modification in noun phrases
- Modifiers to the right of the head noun are always attached
_before_
- any modifiers to the left
- |important changes by the SEC|
- First combine |changes| with |by the SEC| using NADJ_RR
Notes from Tomar meeting
- Where lexical ambiguity is hard to decide (e.g. even-deg vs even-conj), choose based on frequency in redwoods/deepbank
- Disprefer modifier attachment to semantically vacuous heads e.g. attach adverbs to hiring…, not be hiring…
- For there-copula:
- Avoid double-object choice and avoid modification of there-cop
- Also prefer low attachment of modifier after obj NP
- Accept extraction of PP for there-cop as is
- When choice of verb-particle or verb-mod as in go away, if you can modify the `particle’ as in go far away, it is not verb-particle.
- When choice of spr-hd or mod-hd for Adv-Adj, choose mod-hd
- Avoid adv-add except for not
- When WH-Q of form NP-be-NP [EMB: guessing this is choose subj-head; Dan please confirm]
- For complement of saying, if there’s a main clause option for the
quoted material choose it:
- |“Who did Kim hire” asked Mary| not |*Who Kim hired, asked Mary|
- No free relatives
- Attach three-dot punct as low as possible
- Reject ellipsis
- For ndash between clauses, use run-on
- For degree specifiers, when there’s a choice, take the shortest lexent type name
- Attach subord clause high [EMB: subordinate clauses are understood as clauses with all arguments overt; do not include in+order+to purposives, etc.]
Last update: 2020-07-23 by AlexandreRademaker [edit]