Discussion: Semantic vs Syntactic Treebanking
Moderator: Berthold Crysmann
Scriber: Tara Wueger
Slides:
Minutes
Overview
- treebanking ?Hausa grammar
- train up students to do disambiguation/annotations
- get students to understand what are semantic relations
- what is possible/can be made possible easily to allow treebanking remotely?
- 2 ways to treebank: people don’t need to know what set of rules … basic semantics vs. have to teach people to know what set of rules; much more in terms of machine to compute the meanings
- in lkb and switch between 2 ways; different syntactic rules (minority of cases)
- thinking of training up students; be more aware of ambiguity; understand output of mrs grammars; worthwhile skill to teach
- lkb linked to tsdb++ doesn’t ?properly allow treebanking
- what you get in web demos in lkb/tsdb++ logon web demos: discriminators; pick/choose ones you want/don’t want; not connected to database; not part of lkb-fos
- to what extent? where to look for? what could be done easily for treebanking?
- can that be extracted/used from full forest treebank?
- can web demos be connected?
Discussion
- Emily: Why not the full forest treebanker? want semantic discriminator instead of little trees; but trees are more friendly than “here’s the name of the rule and the spans”; get down to tree, then look at semantics; possible to set up as server to use for treebanking (Woodley helped Emily and Kristen set that up)
- Berthold: mrs discriminants would represent these relations in terms of trees; any value in that?
- Dan: in demo for 2014 version of grammar: used demo and found tree he (?Stefan) wanted, then added discriminants that were more semantically driven, not showing all discriminants and picking convenient ones
- Dan: extract information you want and show those choices; communicate with maintainer of full forest treebank; most people that want to do treebanking never do syntactic part of grammar (not sure if we want them to since it’s all based on theory - doesn’t constrain enough); treebank on eds/dmrs segments (would be more engineering)
- Dan: won’t work great when being promiscuous with number of parses; constrained limited to class of grammars can afford to enumerate enough trees so mrs you wanted was somewhere in there
- Francis: less than 10,000 trees; top 100 trees tree we wanted was in there 2% of the time
- Francis: need bootstrapping to get it (partial model) to work
- Berthold: 2% of cases couldn’t be found in top 100 but could be found further down; bootstrapping limiting explosion; less idiosyncratic way of looking at grammars
- Berthold: helping ppl getting involved; in summer school, big discrepancy in what you sell to novice; learning curve for Hausa is extremely high; does woodley think it’s worthwhile?
- Woodley: question on whether you can do semantic discriminants on full forest? no
- Dan: could you present semantic discriminants on ??
- Woodley: would be a different tool. i would not be likely to write that tool
- Berthold: what would it take to write the tool?
- Woodley: not too hard; if enumerated, not too hard; not an enormous task
- Dan: a lot of reusable code
- Woodley: ace only exports ?? mrs
- Guy: for semantic discriminants, what is the technical issue?
- Woodley: data that would be needed is in mrs but mrs doesn’t exist yet when looking at shape of forest; doesn’t look at features structures even at all; no info about mrs
- Guy: would you need to change packing machinery?
- Woodley: would be easy but wouldn’t do you any good; would be very slow
- Guy: something in between?
- Dan: i don’t think this is a necessary branch to go down; we can well afford to compute 100 trees; present discriminations to differentiate those 100 trees; can afford all of that
- Dan: show people how to take the bootstrapping step based on small amount of relatively short setnences; can afford to build/train a model and use that in future treebanking; if experience carries forward then answer would be in top 100; avoids issue with packing (that would be unpleasant)
- Woodley: packing idea would not be easy
- Luis: right now, don’t have to be limited by any number; i tried a hundred; didn’t find it so try 100 more; not limited; go down what makes sense and go deeper if have to
- Woodley: life wasn’t so bad when treebanking top 500 trees
- Francis: was way better when switching to 100
- Luis: i did 500; took time to preprocess
- Woodley burden on person treebanking higher than on full forest treebanking software; useful for grammarian with 20 billion ??
- Berthold: pydelphin has dependencies mrs style visualizations is that right?
- someone: yes
- Berthold: … go in C code?
- Luis: would never go in C code; machinery that pydelphin offers (?pyviz); there is a link in my slides; what would be next step of a tool like that; store x number of parses and then have discriminant on mrs
- Woodley: server side running something that throws up mrs; use pydelphin to extract dependencies/mrs; use discriminator extraction code adapted from (for example) the lkb; delphin viz stuff to throw discriminants as partial tree events
- Dan: set of mrs’s; figure out how to design/what little arcs you want to show/show the necessary contrast; “i didn’t want the attachment here i wanted it there”; need partial diagram of dmrs or dds to make choice you want; then render that using existing delphin viz code; want something conceptually coherent that’s gonna take more thinking
- Weiwei: use erg to annotate english as a 2nd language so a lot of sentences would be powered by erg; many rules ?… main challenge; reuse erg and analyze sentences; not much experience as annotator (are students w/ some basic linguistics); to present annotators with full analysis but mrs is difficult to read by students; show them the bilexical semantic dependency graph; the students can also see bilexical dependency structure; they can say yes/no to an analysis; if bilexical is corrected, underlying mrs is almost always correct; reasonable way for students in 2nd/3rd year of ling or compling
- Dan: if we understand right, you have some code that will take set of mrs’s (say 1 from 10 trees) can compute bilexical ??; can make choices
- Weiwei: no; 1st thing is to use erg for 500 parses; use ?? to train ranking model that can use those few ?? and select from these parses; method is quite reliable
- Dan: annotator is just confirming/denying that top tree is correct (saying good/bad for single tree)?
- Weiwei: yes
- Dan: … mal rules; script to compile erg will bring 100-300 mal rules which will improve robustness for corpus you are trying to parse; have you seen that?
- Weiwei: no
- Emily: we are out of time
- Dan: i will follow up with that separately
Last update: 2022-07-19 by taraw28 [edit]