Discussion of MWEs, inspired by Ann’s participation in PARSEME.

We have started a new page on MWEs (MweTop) to which we will link various relevant things.

What do we do now

  • things with spaces
    • some interfacing with morphology
  • things made into a single predicate
    • look up
  • things recognized as larger things with idiom matchin
    • determiner-less PPs in hospital

      • also occurs outside/slightly genericy
      • semi-lexicalized
    • idiom thingies keep tabs on

      • some work at NTU/CSLI on possessive idioms

      + note: not marked as a unit in the MRS output + supported by LKB and ACE

    • different types of idiomaticity detless_pp vs flexible idioms
    • we have paraphrase rules for many of these
      • but not perfect out of your tiny mind
  • how does the interface with chart mapping/tokenization
  • what about the idiomatic/non-idiomatic distinction
    • we don’t enforce it perfectly
  • we maybe have more examples of MWEs with structure than anyone else
    • although we don’t have as many examples as e.g. in wordnet
  • SRG: words with spaces, verb+particle, idioms (take into account)
  • Matrix: no idioms (FCB: there is documentation on the wiki)
  • NorSource: not yet
  • Burger: some types for verb+complement
  • Jacy: all kinds, even documentation
  • Hegram: nothing
  • MCG: nothing
    • Chengyu (four character idioms)
      • treat them as non-compositional
      • NTU has a list of these with some more information (with help from Mike and Ning)
      • there are also non-Chengyu idioms
  • we can have both internal and external modification (for some idioms)
    • the cat kicked all nine buckets (Mike)
  • a lot of regional use
  • treat proverbial the same as fucking (can go anywhere)
  • in general adding MWEs adds ambiguity so we tend not to add them
    • if they help in parse-selection it would be worth putting them in
    • even very common things like Thank you and good morning

Things we don’t have an account for:

  • institutionalized phrases traffic light/traffic signal
  • light verbs/light verby idioms give a rat’s arse [about]
  • proverbs — how do we handle these a stitch in time saves nine)

    • interestingly cross-lingually
    • often contains frozen bits of older grammars
  • fixed foreign phrases (que sera sera)
    • interesting to see if there are differences
      • in flexibility between old English vs foreign
  • NPIs are on the edge of this phenomenon
  • things like you may wish to -> you should (post-process)
  • If you like currently words-with-space in ERG

Other projects

  • MWEs with structure in wordnets
  • Lots of work in Japan, e.g. on idiom/literal (Chikara)

Last update: 2013-08-01 by FrancisBond [edit]