Andrew, thanks for your clear summary of the issues just discussed regarding the deletion and addition of syllable fragments.  There are two more general issues that I think need to be considered in this discussion:

1. Graphic versus phonetic encoding.
2. The purpose of transcription.

The first issue is discussed at some length in Peter M. Scharf and Malcolm D. Hyman. 2010. Linguistic Issues in Encoding Sanskrit. The book is available as a PDF at the following link: http://www.sanskritlibrary.org/publications.html
Unicode Indic scripts are not consistent in being either phonetic or graphic but are somewhere between.  Hence the problems with differently ordered graphic elements requiring differently ordered encoding in Unicode.  In our book we briefly discussed what segmental and featural graphic encodings for Devanagari would look like, but did not detail either such encoding.  If one really wants to solve transcription issues down to the detail of fragments of syllables, one requires a featural encoding of the Indic script in question.  What this means, for example, is that a mAtrA has a separate code point, the base of an o and A has a separate code point from the top of the o, etc.  If this were the case, the problems raised would have precise solutions.

The second point, however, raises the question of whether such detail is worth the trouble.  In my opinion it is not though some may disagree.  I think it is not because computer graphics is at such a stage that it is trivial to insert or share an image where such considerations are relevant.  For the meaning of the text, such considerations are not at all relevant, and it is ultimately the meaning of texts that is of interest.  Since scripts are a way of encoding a language and the language is a phonetic entity, it is the phonetic sequence that is relevant, not the graphic representation of that sequence.  This is the reason for the invention and use of the Sanskrit Library Phonetic (SLP) encodings and the production of a thorough set of transcoding routines to transcode SLP into standard Romanizations, other metaencodings, and Indic scripts.

Lastly, one final question I have is this:  Has any circumstance arisen in mss transcription where a del followed immediately by an add is not a subst?  And if a something intervenes between the del and the add, can it rightly be called a subst?  I don't know of a positive answer to either of these questions and therefore question the utility of the subst element.  But if someone can show such a prayojana, I'm interested to see it.

Yours,
Peter

******************************
Peter M. Scharf, President
The Sanskrit Library
******************************

On 10 Apr. 2018, at 8:59 AM, Andrew Ollett <andrew.ollett@gmail.com> wrote:

Dear all,

I was delighted, though not entirely surprised, to see that many of you had grappled with similar issues. I sketched out the issues briefly and wrote up some of the solutions that you've suggested here:

https://wiki.tei-c.org/index.php/SIG:IndicTexts

(I found Paddy's example from early Bengali script extremely useful, but I didn't presume to add it to the wiki without asking.)

Transliteration promises to be a persistent issue: many encoding strategies just don't make sense if we are either inputting our texts in Indic scripts or providing for output in Indic scripts. It's not clear to me at this point how worried I should be about this: in our project, the only reason for offering a Kannada-script version of the manuscript transcriptions is "why not?". But if we did need a principled approach, the use of wrappers like <c> (or <g>?) might help.

I also tried looking at the ENRICH guidelines after Camillo mentioned them. I couldn't locate a schema file, and in the online documentation (there are lots of broken links) I didn't see any specification of attribute values for @type (in <del>) or @place (in <add>). But I tried to stick to what Dániel and Camillo recommended.

Thanks everyone, and please feel free to suggest additions or modifications to the TEI Wiki page (anyone with a TEI account can edit as well).

Andrew

2018-04-04 4:06 GMT-05:00 Balogh Dániel <danbalogh@gmail.com>:
Hello all, and let me add my thanks to Andrew for starting a thread. I've read your opinions with interest and have little new to add, but here are my thoughts anyway.

My approach is based on encoding texts in Romanised form (whether IAST or SLP doesn't matter). My basic feeling is that marking up every little feature may not be worth the trouble. I believe this is what Camillo has been suggesting and what Peter has shown in his example. So reflecting changes on the level of phonemes should be fine, and a change of one vowel to another can be marked up simply as <subst><del>o</del><add>ā</add></subst>. The deletion and the addition could be qualified with attributes as in Andrew's original post. I believe the generic solution (recommended in EpiDoc) would serve well: <subst><del rend="corrected">o</del><add place="overstrike">ā</add></subst> Exactly how this change is implemented graphically in the written specimen is, as Camillo says, something that can be left to the reader's knowledge of the writing system in question; or, in the rare cases where we as editors think it will not be obvious to anyone who cares, described in a comment for human readers only. The same would work for the deletion of vowel mātrās i.e. correction of another vowel to "a". Or, if the a is described as implicit for the sake of precision, I would still suggest using the @place attribute with that value, not @type.

If it is desired that the encoded text can be rendered in an Indic script, we must keep in mind that this is mainly for the sake of modern readers who are more familiar with those scripts than Romanisation. In most cases, rendering in a Devanagari or Kannada or whatever font will not be a 100% accurate representation of the way complex akṣaras are constituted in the MS. So, in my mind, this is a display issue that needs to be dealt with in the XSLT that produces human-readable output from your markup. Wrapping the entire akṣara in a <c> element may make the transformation a lot easier. This is similar to what Charles is doing with <subst> but, I believe, entirely canonical. Using <c> to wrap akṣaras may also be an idea to consider for Andrew's second problem, though of course it doesn't solve the problem of the floating r.

All the best,
Dan

_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/indic-texts

_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/indic-texts