Abstract

In this paper, we describe the expansion of the ODIN resource, a database containing many thousands of instances of Interlinear Glossed Text (IGT) for over a thousand languages. A database containing a large number of instances of IGT, which are effectively richly annotated and heuristically aligned bitexts, provides a unique resource for bootstrapping NLP tools for resource-poor languages. To make the data in ODIN more readily consumable by tool developers and NLP researchers, we propose a new XML format for IGT, called Xigt. We call the updated release ODIN-II.

BibTex

@inproceedings{xia-et-al-2014,
  author = {Xia, Fei and Lewis, William and Goodman, Michael Wayne and Crowgey, Joshua and Bender, Emily M.},
  url = {http://www.lrec-conf.org/proceedings/lrec2014/pdf/1072_Paper.pdf},
  note = {ACL Anthology Identifier: L14-1055},
  title = {Enriching ODIN},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
  year = {2014},
  month = {26--31~May},
  address = {Reykjavik, Iceland},
  editor = {Calzolari, Nicoletta and Choukri, Khalid and Declerck, Thierry and Loftsson, Hrafn and Maegaard, Bente and Mariani, Joseph and Moreno, Asuncion and Odijk, Jan and Piperidis, Stelios},
  publisher = {European Language Resources Association (ELRA)},
  isbn = {978-2-9517408-8-4},
  language = {english},
  pages = {3151--3157}
}