(Sorry, I sent the first one by mistake!)

Dear Andrew, Peter, and Charlie,

Finally somebody started a thread!

If you don't mind my being very direct in what I write (I'm smiling, I assure you, because I totally share your doubts about how to mark up such cases), similar questions will always come up if we don't think starting from basic problems regarding how we look at manuscripts. We ought to first agree about the way we describe manuscripts and only then we can start to ask ourselves how to mark up. I believe two questions ought to be asked first (Peter partly pointed out already the first one in his reply): why mark up such phenomena? Also, I would add: to which degree of exactness?

As to the first question, there are obvious answers, such as if I'm preparing a diplomatic transcription or a critical edition, I have to do it. Then how? All solutions proposed entail the use of the elements <del></del> and <add></add>, as well as <subst></subst> (as in Charlie's example, who I guess is partly adopting our Cambridge standards), thus with the basic structure <subst><del></del><add></add></subst>. I totally agree with this approach, but...

Now let me answer to the first possible objection: in Andrew's example, is the scribe really adding something? Sure he is (let's not get politically correct, we know it was almost certainly a man, even if there is no colophon in the manuscript). He is not materially adding anything on the folio, sure, but what are we marking up? Let's say he wanted to substitute o with ā, then he would have added a mātrā, right? As we all know, the functioning of an abugida writing system rests on the principle of an inherent vowel. The point here is "as we all know." We are marking up transcriptions of manuscripts in scripts of which we know the functioning, so no need to get more catholic than the pope. Also, to a certain extent the scribe was substituting something with something else, by deleting an o and adding an a (or in other cases, a mātrā for any other vowel). I think that this is an elegant way of solving the "implicit" problem, though without using any further element or attribute.

The answer to my second question might also provide an answer to Andrew's second conundrum. In our catalogue we adopted two attributes for deletions and additions: for <add></add> we used @place to mark where the addition was made (using the standard values provided in the ENRICH schema), and for <del></del> we used the @type (values =yellow_paste, expuncted, erased, palimpsest, cancelled). I don't know if we can agree about the number or typology of attributes to be used, but this is not so important, as we will always have slightly different approaches, for as Peter pointed out, we have usually have different aims when describing manuscripts.

Thinking of the approach I have described above, the "we all know how an abugida works" argument might also solve the conundrum of marking up a whole akṣara or only a part. With this approach, there is no need to mark up only parts of an akṣara, as it is clear that only the mātrā was changed. (Also, no problem for cases of akṣaras divided by string holes, we can always nest the elements, if I get the problem–but I'm not really sure to have understood it.)

A belated Happy Eater to you all!

Camillo

________________________________________
From: indic-texts-bounces@lists.tei-c.org [indic-texts-bounces@lists.tei-c.org] on behalf of indic-texts-request@lists.tei-c.org [indic-texts-request@lists.tei-c.org]
Sent: Sunday, April 01, 2018 11:00 AM
To: indic-texts@lists.tei-c.org
Subject: indic-texts Digest, Vol 3, Issue 1

Send indic-texts mailing list submissions to
        indic-texts@lists.tei-c.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
or, via email, send a message with subject or body 'help' to
        indic-texts-request@lists.tei-c.org

You can reach the person managing the list at
        indic-texts-owner@lists.tei-c.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of indic-texts digest..."

Today's Topics:

   1. some problems with encoding parts of ak?aras in manuscript
      transcriptions (Andrew Ollett)
   2. Re: some problems with encoding parts of ak?aras in
      manuscript transcriptions (Peter Scharf)

----------------------------------------------------------------------

Message: 1
Date: Sat, 31 Mar 2018 22:50:16 -0500
From: Andrew Ollett <andrew.ollett@gmail.com>
To: indic-texts@lists.tei-c.org
Subject: [Indic-texts] some problems with encoding parts of ak?aras in
        manuscript transcriptions
Message-ID:
        <CAANHO15y_K1BRZLqj+PauXoxcvdgusDS1Df99juUp+W7xkV-Lg@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hi everyone,

I have two questions about encoding manuscript transcriptions that I wanted
to submit to the collective experience of this group. Both relate to the
problem of ak?aras having "parts" that canonically occur in a certain
sequence but may be changed in a manuscript.

First, the cancellation of vowel m?tr?s. Does anyone have a good way to
encode this in TEI? In one manuscript (see this link <https://goo.gl/uYxV3R>)
the scribe has written "yo?do?da?o?o?a" (the last letter being mostly
obliterated by a worm hole), and cancelled out the last "o" (which is
written with the sideways "3") with a small cross-mark on top. The problem
is that by cancelling out the m?tr?, the scribe has changed the vowel.
Since we are transcribing in Roman transliteration, we would have to do
something like yo?do?da?o?<del rend="cross">o</del><add
type="implicit">a</add>?a, i.e., marking the addition of the vowel as
"implicit" (or something similar) in order to make clear that it's not a
new mark on the leaf. (If we were transcribing in Kannada script, we could
do ??????????<del>?</del>?, but that will surely cause rendering problems.)

Second, the "canonical" order of code-points and of transliteration for
conjuncts with initial r has the r first. In Kannada script, however, a
"flying r" is used, which occurs to the right of the other consonants in
the conjunct. Sometimes there's a feature that we want to encode *between*
the members of the conjunct, as in this example <https://goo.gl/BqNVg9>,
where a string-hole intervenes between the "gg" and the "r" of "m?rgga?".
How should we encode this? I know that some of you have used "ak?arapart"
to identify m?tr?s and other components of ak?aras, but I can't seem to get
around the problem of the reversed sequence of the phonological
representation and the graphic representation.

Grateful for any help!

Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lists.tei-c.org/pipermail/indic-texts/attachments/20180331/80dc569c/attachment-0001.html>

------------------------------

Message: 2
Date: Sun, 1 Apr 2018 12:08:31 +0530
From: Peter Scharf <scharf@sanskritlibrary.org>
To: Andrew Ollett <andrew.ollett@gmail.com>
Cc: indic-texts@lists.tei-c.org
Subject: Re: [Indic-texts] some problems with encoding parts of
        ak?aras in manuscript transcriptions
Message-ID: <253C0C1B-A0F5-4808-9252-45D62D04BC83@sanskritlibrary.org>
Content-Type: text/plain; charset="utf-8"

Dear Andrew et al.,
        I'm interested in how others have handled this situation.  The way we handled it cataloguing the Brown, Penn, and Harvard collections of Sanskrit manuscripts was:

1. We transcribe in SLP1 which is a Romanization so allows splitting conjuncts and separating vowels from the consonants on which they depend.
2. We delete a whole vowel or consonant whenever any part of it is indicated as deleted in the ms. and add the whole replacement without comment, thus <del>o</del><add>a</add>.
3. For rendering in Devanagari or another Indic script, we thought it is not a difficult task to transpose the finer tagging of phones in a romanization to whole akzaras in an Indic script, so transposed one would get <del>ro</del><add>ra</add> [since r underscore is does not represent a Sanskrit sound SLP1 does not encode it so I'm alterning the example].
4. For features between conuncts,

<s part='I'>...g</s>
<gap reason='design'>
 <desc>string-hole</desc>
</gap>
<s part='F>g ...</s>

Most of the time, however, we ignored regular details like string holes and just included a description of them in the layout description rather than repeating the above prolixity.  I think that in this day and age when one can link to a graphic image of the manuscript page, it is unnecessary and undesirable to complicate transcription of manuscripts with irrelevant graphic details, and the utility of including these in diplomatic transciption is higly diminished.

Yours,
Peter

******************************
Peter M. Scharf, President
The Sanskrit Library
scharf@sanskritlibrary.org
http://sanskritlibrary.org
******************************