Discussion: Semantic vs Syntactic Treebanking

Moderator: Berthold Crysmann

Scriber: Tara Wueger

Slides:

Minutes

Overview

  • treebanking ?Hausa grammar
  • train up students to do disambiguation/annotations
  • get students to understand what are semantic relations
  • what is possible/can be made possible easily to allow treebanking remotely?
  • 2 ways to treebank: people don’t need to know what set of rules … basic semantics vs. have to teach people to know what set of rules; much more in terms of machine to compute the meanings
  • in lkb and switch between 2 ways; different syntactic rules (minority of cases)
  • thinking of training up students; be more aware of ambiguity; understand output of mrs grammars; worthwhile skill to teach
  • lkb linked to tsdb++ doesn’t ?properly allow treebanking
  • what you get in web demos in lkb/tsdb++ logon web demos: discriminators; pick/choose ones you want/don’t want; not connected to database; not part of lkb-fos
  • to what extent? where to look for? what could be done easily for treebanking?
  • can that be extracted/used from full forest treebank?
  • can web demos be connected?

Discussion

  • Emily: Why not the full forest treebanker? want semantic discriminator instead of little trees; but trees are more friendly than “here’s the name of the rule and the spans”; get down to tree, then look at semantics; possible to set up as server to use for treebanking (Woodley helped Emily and Kristen set that up)
  • Berthold: mrs discriminants would represent these relations in terms of trees; any value in that?
  • Dan: in demo for 2014 version of grammar: used demo and found tree he (?Stefan) wanted, then added discriminants that were more semantically driven, not showing all discriminants and picking convenient ones
  • Dan: extract information you want and show those choices; communicate with maintainer of full forest treebank; most people that want to do treebanking never do syntactic part of grammar (not sure if we want them to since it’s all based on theory - doesn’t constrain enough); treebank on eds/dmrs segments (would be more engineering)
  • Dan: won’t work great when being promiscuous with number of parses; constrained limited to class of grammars can afford to enumerate enough trees so mrs you wanted was somewhere in there
  • Francis: less than 10,000 trees; top 100 trees tree we wanted was in there 2% of the time
  • Francis: need bootstrapping to get it (partial model) to work
  • Berthold: 2% of cases couldn’t be found in top 100 but could be found further down; bootstrapping limiting explosion; less idiosyncratic way of looking at grammars
  • Berthold: helping ppl getting involved; in summer school, big discrepancy in what you sell to novice; learning curve for Hausa is extremely high; does woodley think it’s worthwhile?
  • Woodley: question on whether you can do semantic discriminants on full forest? no
  • Dan: could you present semantic discriminants on ??
  • Woodley: would be a different tool. i would not be likely to write that tool
  • Berthold: what would it take to write the tool?
  • Woodley: not too hard; if enumerated, not too hard; not an enormous task
  • Dan: a lot of reusable code
  • Woodley: ace only exports ?? mrs
  • Guy: for semantic discriminants, what is the technical issue?
  • Woodley: data that would be needed is in mrs but mrs doesn’t exist yet when looking at shape of forest; doesn’t look at features structures even at all; no info about mrs
  • Guy: would you need to change packing machinery?
  • Woodley: would be easy but wouldn’t do you any good; would be very slow
  • Guy: something in between?
  • Dan: i don’t think this is a necessary branch to go down; we can well afford to compute 100 trees; present discriminations to differentiate those 100 trees; can afford all of that
  • Dan: show people how to take the bootstrapping step based on small amount of relatively short setnences; can afford to build/train a model and use that in future treebanking; if experience carries forward then answer would be in top 100; avoids issue with packing (that would be unpleasant)
  • Woodley: packing idea would not be easy
  • Luis: right now, don’t have to be limited by any number; i tried a hundred; didn’t find it so try 100 more; not limited; go down what makes sense and go deeper if have to
  • Woodley: life wasn’t so bad when treebanking top 500 trees
  • Francis: was way better when switching to 100
  • Luis: i did 500; took time to preprocess
  • Woodley burden on person treebanking higher than on full forest treebanking software; useful for grammarian with 20 billion ??
  • Berthold: pydelphin has dependencies mrs style visualizations is that right?
  • someone: yes
  • Berthold: … go in C code?
  • Luis: would never go in C code; machinery that pydelphin offers (?pyviz); there is a link in my slides; what would be next step of a tool like that; store x number of parses and then have discriminant on mrs
  • Woodley: server side running something that throws up mrs; use pydelphin to extract dependencies/mrs; use discriminator extraction code adapted from (for example) the lkb; delphin viz stuff to throw discriminants as partial tree events
  • Dan: set of mrs’s; figure out how to design/what little arcs you want to show/show the necessary contrast; “i didn’t want the attachment here i wanted it there”; need partial diagram of dmrs or dds to make choice you want; then render that using existing delphin viz code; want something conceptually coherent that’s gonna take more thinking
  • Weiwei: use erg to annotate english as a 2nd language so a lot of sentences would be powered by erg; many rules ?… main challenge; reuse erg and analyze sentences; not much experience as annotator (are students w/ some basic linguistics); to present annotators with full analysis but mrs is difficult to read by students; show them the bilexical semantic dependency graph; the students can also see bilexical dependency structure; they can say yes/no to an analysis; if bilexical is corrected, underlying mrs is almost always correct; reasonable way for students in 2nd/3rd year of ling or compling
  • Dan: if we understand right, you have some code that will take set of mrs’s (say 1 from 10 trees) can compute bilexical ??; can make choices
  • Weiwei: no; 1st thing is to use erg for 500 parses; use ?? to train ranking model that can use those few ?? and select from these parses; method is quite reliable
  • Dan: annotator is just confirming/denying that top tree is correct (saying good/bad for single tree)?
  • Weiwei: yes
  • Dan: … mal rules; script to compile erg will bring 100-300 mal rules which will improve robustness for corpus you are trying to parse; have you seen that?
  • Weiwei: no
  • Emily: we are out of time
  • Dan: i will follow up with that separately

Last update: 2022-07-19 by taraw28 [edit]