Dear Andrew et al.,
I'm interested in how others have handled this situation. The way we handled it cataloguing the Brown, Penn, and Harvard collections of Sanskrit manuscripts was:
1. We transcribe in SLP1 which is a Romanization so allows splitting conjuncts and separating vowels from the consonants on which they depend.
2. We delete a whole vowel or consonant whenever any part of it is indicated as deleted in the ms. and add the whole replacement without comment, thus <del>o</del><add>a</add>.
3. For rendering in Devanagari or another Indic script, we thought it is not a difficult task to transpose the finer tagging of phones in a romanization to whole akzaras in an Indic script, so transposed one would get <del>ro</del><add>ra</add> [since r underscore is does not represent a Sanskrit sound SLP1 does not encode it so I'm alterning the example].
4. For features between conuncts,
<s part='I'>...g</s>
<gap reason='design'>
<desc>string-hole</desc>
</gap>
Second, the "canonical" order of code-points and of transliteration for conjuncts with initial r has the r first. In Kannada script, however, a "flying r" is used, which occurs to the right of the other consonants in the conjunct. Sometimes there's a feature that we want to encode *between* the members of the conjunct, as in this example https://goo.gl/BqNVg9, where a string-hole intervenes between the "gg" and the "r" of "mārggaṁ". How should we encode this? I know that some of you have used "akṣarapart" to identify mātrās and other components of akṣaras, but I can't seem to get around the problem of the reversed sequence of the phonological representation and the graphic representation.
Grateful for any help!
Andrew _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts