Dear Andrew et al.,
I'm interested in how others have handled this situation.  The way we handled it cataloguing the Brown, Penn, and Harvard collections of Sanskrit manuscripts was:

1. We transcribe in SLP1 which is a Romanization so allows splitting conjuncts and separating vowels from the consonants on which they depend.
2. We delete a whole vowel or consonant whenever any part of it is indicated as deleted in the ms. and add the whole replacement without comment, thus <del>o</del><add>a</add>.
3. For rendering in Devanagari or another Indic script, we thought it is not a difficult task to transpose the finer tagging of phones in a romanization to whole akzaras in an Indic script, so transposed one would get <del>ro</del><add>ra</add> [since r underscore is does not represent a Sanskrit sound SLP1 does not encode it so I'm alterning the example].
4. For features between conuncts,

<s part='I'>...g</s>
<gap reason='design'>
 <desc>string-hole</desc>
</gap>
<s part='F>g ...</s>

Most of the time, however, we ignored regular details like string holes and just included a description of them in the layout description rather than repeating the above prolixity.  I think that in this day and age when one can link to a graphic image of the manuscript page, it is unnecessary and undesirable to complicate transcription of manuscripts with irrelevant graphic details, and the utility of including these in diplomatic transciption is higly diminished.

Yours,
Peter

******************************
Peter M. Scharf, President
The Sanskrit Library
******************************

On 1 Apr. 2018, at 9:20 AM, Andrew Ollett <andrew.ollett@gmail.com> wrote:

Hi everyone,

I have two questions about encoding manuscript transcriptions that I wanted to submit to the collective experience of this group. Both relate to the problem of akṣaras having "parts" that canonically occur in a certain sequence but may be changed in a manuscript. 

First, the cancellation of vowel mātrās. Does anyone have a good way to encode this in TEI? In one manuscript (see this link) the scribe has written "yoṁdoṁdaṟoḷoḷa" (the last letter being mostly obliterated by a worm hole), and cancelled out the last "o" (which is written with the sideways "3") with a small cross-mark on top. The problem is that by cancelling out the mātrā, the scribe has changed the vowel. Since we are transcribing in Roman transliteration, we would have to do something like yoṁdoṁdaṟoḷ<del rend="cross">o</del><add type="implicit">a</add>ḷa, i.e., marking the addition of the vowel as "implicit" (or something similar) in order to make clear that it's not a new mark on the leaf. (If we were transcribing in Kannada script, we could do ಯೊಂದೊಂದಱೊಳ<del>ೊ</del>ಳ, but that will surely cause rendering problems.)

Second, the "canonical" order of code-points and of transliteration for conjuncts with initial r has the r first. In Kannada script, however, a "flying r" is used, which occurs to the right of the other consonants in the conjunct. Sometimes there's a feature that we want to encode *between* the members of the conjunct, as in this example, where a string-hole intervenes between the "gg" and the "r" of "mārggaṁ". How should we encode this? I know that some of you have used "akṣarapart" to identify mātrās and other components of akṣaras, but I can't seem to get around the problem of the reversed sequence of the phonological representation and the graphic representation.

Grateful for any help!

Andrew
_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/indic-texts