Discussion of MWEs, inspired by Ann’s participation in PARSEME.
We have started a new page on MWEs (MweTop) to which we will link various relevant things.
What do we do now
- things with spaces
- some interfacing with morphology
- things made into a single predicate
- look up
- things recognized as larger things with idiom matchin
-
determiner-less PPs in hospital
- also occurs outside/slightly genericy
- semi-lexicalized
-
idiom thingies keep tabs on
- some work at NTU/CSLI on possessive idioms
+ note: not marked as a unit in the MRS output + supported by LKB and ACE
- different types of idiomaticity detless_pp vs flexible idioms
- we have paraphrase rules for many of these
- but not perfect out of your tiny mind
-
- how does the interface with chart mapping/tokenization
- what about the idiomatic/non-idiomatic distinction
- we don’t enforce it perfectly
- we maybe have more examples of MWEs with structure than anyone else
- although we don’t have as many examples as e.g. in wordnet
- SRG: words with spaces, verb+particle, idioms (take into account)
- Matrix: no idioms (FCB: there is documentation on the wiki)
- NorSource: not yet
- Burger: some types for verb+complement
- Jacy: all kinds, even documentation
- not so good with things like te-nakareba-narimasen, complex pps
- (http://moin.delph-in.net/JacyIdiom)
- Hegram: nothing
- MCG: nothing
- Chengyu (four character idioms)
- treat them as non-compositional
- NTU has a list of these with some more information (with help from Mike and Ning)
- there are also non-Chengyu idioms
- Chengyu (four character idioms)
- we can have both internal and external modification (for some
idioms)
- the cat kicked all nine buckets (Mike)
- a lot of regional use
- treat proverbial the same as fucking (can go anywhere)
- in general adding MWEs adds ambiguity so we tend not to add them
- if they help in parse-selection it would be worth putting them in
- even very common things like Thank you and good morning
Things we don’t have an account for:
- institutionalized phrases traffic light/traffic signal
- light verbs/light verby idioms give a rat’s arse [about]
-
proverbs — how do we handle these a stitch in time saves nine)
- interestingly cross-lingually
- often contains frozen bits of older grammars
- fixed foreign phrases (que sera sera)
- interesting to see if there are differences
- in flexibility between old English vs foreign
- interesting to see if there are differences
- NPIs are on the edge of this phenomenon
- things like you may wish to -> you should (post-process)
- If you like currently words-with-space in ERG
Other projects
- MWEs with structure in wordnets
- Lots of work in Japan, e.g. on idiom/literal (Chikara)
Last update: 2013-08-01 by FrancisBond [edit]