Andrew, thanks for your clear summary of the issues just discussed regarding the deletion and addition of syllable fragments. There are two more general issues that I think need to be considered in this discussion:
1. Graphic versus phonetic encoding.
2. The purpose of transcription.
Unicode Indic scripts are not consistent in being either phonetic or graphic but are somewhere between. Hence the problems with differently ordered graphic elements requiring differently ordered encoding in Unicode. In our book we briefly discussed what segmental and featural graphic encodings for Devanagari would look like, but did not detail either such encoding. If one really wants to solve transcription issues down to the detail of fragments of syllables, one requires a featural encoding of the Indic script in question. What this means, for example, is that a mAtrA has a separate code point, the base of an o and A has a separate code point from the top of the o, etc. If this were the case, the problems raised would have precise solutions.
The second point, however, raises the question of whether such detail is worth the trouble. In my opinion it is not though some may disagree. I think it is not because computer graphics is at such a stage that it is trivial to insert or share an image where such considerations are relevant. For the meaning of the text, such considerations are not at all relevant, and it is ultimately the meaning of texts that is of interest. Since scripts are a way of encoding a language and the language is a phonetic entity, it is the phonetic sequence that is relevant, not the graphic representation of that sequence. This is the reason for the invention and use of the Sanskrit Library Phonetic (SLP) encodings and the production of a thorough set of transcoding routines to transcode SLP into standard Romanizations, other metaencodings, and Indic scripts.
Lastly, one final question I have is this: Has any circumstance arisen in mss transcription where a del followed immediately by an add is not a subst? And if a something intervenes between the del and the add, can it rightly be called a subst? I don't know of a positive answer to either of these questions and therefore question the utility of the subst element. But if someone can show such a prayojana, I'm interested to see it.
Yours,
Peter
******************************
Peter M. Scharf, President
The Sanskrit Library
******************************
Dear all,
I was delighted, though not entirely surprised, to see that many of you had grappled with similar issues. I sketched out the issues briefly and wrote up some of the solutions that you've suggested here:
(I found Paddy's example from early Bengali script extremely useful, but I didn't presume to add it to the wiki without asking.)
Transliteration promises to be a persistent issue: many encoding strategies just don't make sense if we are either inputting our texts in Indic scripts or providing for output in Indic scripts. It's not clear to me at this point how worried I should be about this: in our project, the only reason for offering a Kannada-script version of the manuscript transcriptions is "why not?". But if we did need a principled approach, the use of wrappers like <c> (or <g>?) might help.
I also tried looking at the ENRICH guidelines after Camillo mentioned them. I couldn't locate a schema file, and in the online documentation (there are lots of broken links) I didn't see any specification of attribute values for @type (in <del>) or @place (in <add>). But I tried to stick to what Dániel and Camillo recommended.
Thanks everyone, and please feel free to suggest additions or modifications to the TEI Wiki page (anyone with a TEI account can edit as well).
Andrew
_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.orghttp://lists.lists.tei-c.org/mailman/listinfo/indic-texts