Abstract

This paper presents Xigt, an extensible storage format for interlinear glossed text (IGT). We review design desiderata for such a format based on our own use cases as well as general best practices, and then explore existing representations of IGT through the lens of those desiderata. We give an overview of the data model and XML serialization of Xigt, and then describe its application to the use case of representing a large, noisy, heterogeneous set of IGT.

BibTex

@article{goodman-et-al-2015,
  year = {2015},
  issn = {1574-020X},
  journal = {Language Resources and Evaluation},
  volume = {49},
  number = {2},
  doi = {10.1007/s10579-014-9276-1},
  title = {Xigt: extensible interlinear glossed text for natural language processing},
  url = {http://dx.doi.org/10.1007/s10579-014-9276-1},
  publisher = {Springer Netherlands},
  keywords = {Interlinear glossed text (IGT); Annotation; Storage format},
  author = {Goodman, Michael Wayne and Crowgey, Joshua and Xia, Fei and Bender, Emily M.},
  pages = {455-485},
  language = {English}
}