Thursday 28 October 2010

Bilingual Websites

Last week I had the pleasure of attending the National Digital Forum, a gathering of galleries, libraries, archives and museum people from across New Zealand and around the world. I was invited to talk about the bilingual stuff we do at the NZETC, and I was talking alongside Andy Neale from the National Library and Basil Keane from the Ministry for Cultural and Heritage (and Te Ara http://www.teara.govt.nz/mi in particular).

The Māori texts in the NZETC collection are created in exactly the same way as the English texts. The actual digitisation is outsourced and a quality specified as an error rate per character: when the texts come back we check the digital against to original ensure that the quality is met. The team who check the the quality have to check every character anyway, so the actual language doesn't matter (or not much anyway). The metadata (author, publisher, date, etc.) comes from existing library records, including the fabulous "Books in Māori." At no point does anyone actually need to read and comprehend the texts, which means that we don't need a team of fluent te reo Māori speakers.

Basil Keane at Te Ara is at the opposite end of the spectrum. Basil's work is inherently editorial: he needs to be able to read and understand the subtleties of the texts he's dealing with and also the constellation of factors which impact the interpretation of the work (especially the political ones). As such, he's a licensed translator. He also relies on sources like the NZETC and the various National Library and ArchivesNZ sources to locate particular source materials and also examples of particular language usage.

Andy Neale talked about the planning they did before they rolled out the current National Library main website about how exactly to handle bilingual content.

In the discussion after our talk, it emerged that there was a perceived lack of information about how to approach bilingualism when planning and implementing a website. A number of different groups have done work in the area, but no one is gathering it together in one place and promoting it to those just starting to introduce te reo Māori into their website. To make a start at this, I thought I'd assemble the information I have here:

* The original work in this area was done in print form. It's probably worth reviewing Books in Māori http://books.google.com/books?id=oAUWAQAAIAAJ and the materials listed there to see how these problems were solved in print. Bilingual print has been around longer than bilingual websites, so they may have cracked some of the problems.
* Te Taka Keegan at Waikato published his PhD in the area of browsing of bilingual websites http://researchcommons.waikato.ac.nz/handle/10289/3997 and since has a number of publications http://www.cs.waikato.ac.nz/~tetaka/tuhituhi.html
* The study the National Library did into potential bilingual information architectures for their website is at http://librarytechnz.natlib.govt.nz/2007/12/options-for-bilingual-web-content-and.html
* The koha open source library software http://koha-community.org/ has been translated into te reo Māori and the translation strings are available in machine readable format from, for example http://translate.koha.org/export/opac3_0/mi_NZ/mi-NZ-i-opac-t-prog-v-3000000.po which is a good source for some of the navigational terminology (next, previous, search, etc, etc).
* On the NZETC site there are a couple of URLs to look at for how we handle bilingualism: http://www.nzetc.org/tm/scholarly/facets/search (notice the search for parallel texts and the explicit promotion of different languages) http://www.nzetc.org/tm/scholarly/tei-GorLaws-t1-g1-t1-body1-d1-d20.html There is also a stylesheet I wrote at http://wiki.tei-c.org/index.php/TEI2TMX.xsl which converts our TEI into TMX files (http://en.wikipedia.org/wiki/Translation_Memory_eXchange ), which is a format used by computer-aided translation software (including google translate). This only works with our bilingual texts derived from facing page translations. If anyone wants our texts as TMX files, give me a yodel, as doing it by hand will be slow.
* The position of Welsh in the UK is somewhat similar to the position of te reo Māori in New Zealand and there has been some interesting work there on bilingual websites, particular in terms of situating bilingual websites in the content of theory of language revitalization. See for example Daniel Cunliffe's http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.137.6829&rep=rep1&type=pdf and http://portal.acm.org/citation.cfm?id=778986.779003


If you know of any other good resources for building bilingual websites, got let me know: Stuart.Yeates@vuw.ac.nz / http://twitter.com/stuartayeates / the comments below.

Many of the NDF presentations, including ours, are up at http://www.r2.co.nz/20101018/