Thursday 12 March 2009

Transition to TEI P5

At the NZETC, the base format for all our texts is a format called Text Encoding Initiative (or TEI) and it's an XML format that we used to store all our texts in. We've just gone live with an updated version of the format, called 'P5.' As an end user, you're unlikely to notice the difference between the previous version an P5, but it enables us to do interesting things which you hopefully will notice. These include:
  1. Representation of documents which have large additions (such as books with newspaper clippings pasted into the covers)
  2. Representation of non-Unicode glyphs
  3. The ability to piecewise add extra functionality to encode new features as we take on new projects
At the back end, P5 also allows:
  1. Significantly better validation of dates, times, enumerations, etc, reducing errors and speeding the encoding of new texts
  2. Easier interlinking with other xml schemas
  3. Standardised use of the xml:lang tag to identify languages
  4. A significant upgrade in the accompanying toolset
If you notice anything that's not working or that looks odd, please email us at director@nzetc.org