Jacy is being developed in cooperation with the Hinoki Treebank.

Corpora

           
Name ID Full Name # Sentences # Words Comments
mrs 0 MRS Test Suite 136 ???  
tc 100,000 Tanaka Corpus 150,341 1,756,825 Includes English Translations, 10 profiles (6-15) treebanked

These treebanks are in the jacy/tsdb/gold directory. They may lag behind the most recent version of the grammar.

If you want silver data, parsing the rest of the Tanaka Corpus is a good place to start.

Last update: 2020-07-14 by FrancisBond [edit]