On 14-12-29 02:55 PM, Sebastian Rahtz wrote:
Carved in stone on my iPad
On 29 Dec 2014, at 16:06, Lou Burnard
wrote: One thing I think would make the job easier is a database of existing translation pairs. The language of our descriptions is fairly repetitive, with a number of set phrases which we would hope to see being translated consistently. A display like that of http://www.linguee.fr/anglais-francais/traduction/translation+memory.html but using just the relevant bits of P5 as corpus would be fairly easy to build directly from the existing source, I would have
I considered this at one time, but my heart failed me at the thought of the work involved in parsing all the texts, identifying the sentence structure, finding the translated pairs, allowing for different structure u.s.w.
It doesn't seem to me like a job for amateurs, so I for one reluctantly stand aside.
I have a working implementation of the Universal Similarity Metric in XQuery that runs in eXist. It would be relatively easy to identify similar sentences and phrases. The problem is the combinatorial explosion; every phrase has to be compared with every other phrase to find the closest matches. We have over 1700 strings that need translating, which works out at nearly 300,000 comparisons. Those would have to be run in advance, to identify (say) for each individual string the nearest three matches by similarity in English; once you knew those, you could easily copy any of those which had already been translated into a cell in the row, so the translator of each row would have access to some examples using related terms. That would go some way towards encouraging consistency. It would also allow a translator who believed their translation was better than one of the related ones to go back and suggest a better translation to replace an existing one. Cheers, Martin
Lou, how about asking our mutual friend Ana Frankenberg to advise on strategy? She might know of people who have run such events.
Sebastian