Re: [Indic-texts] some problems with encoding parts of akṣaras in manuscript transcriptions

2 Apr 2018

      On Sun, Apr 01 2018, Charles Li wrote:
...
Hi Andrew! So brave to be the first person to post!
Indeed, congratulations!
...
I have also struggled in silence and darkness with those very problems
which you describe. For the first issue -- vowel signs being added and
deleted -- I do something similar, but using slightly non-canonical TEI.
I actually include the consonant inside a <subst> tag to make explicit
the fact that it's that consonant being modified. So, for example,
abhi<subst>dh<del rend="implied">a</del><add>ā</add></subst>ne
I use something quite similar in my transcriptions, e.g. for the
correction of /nti/ to /nte/ (see the image I’m trying to attach) I have
this:

<subst ana="#subst-vowel-addition"><del>न्ति</del><add>न्ते</add></subst>

The main difference to Charles’ solution is that I put the whole akṣara
into the del and add elements, which is of course not very precise.  But
for my current project I made up my mind that I would treat the
conjuncts as units that I wouldn’t split up any further.  I try to
compensate for this by adding an analysis attribute, which at least
let’s me easily query for classes of corrections/changes.

My reasons for doing this were two (and I should add that they are not
so strong that I’d like to recommend this as a general solution):

First, the transliteration would be easy from this kind of markup:

nti -> न्ति -> ন্তি -> nti

works fine.

But consider Andrew’s case of split vowel-signs, where we would have to
transliterate something like this:

ḷ<del rend="cross">o</del><add type="implicit">a</add>

What should the result look like?

ऌ्<del rend="cross">ो</del><add type="implicit">?</add>

This is not the same as the rendering problem: it seems to me the
implicit ‘a’ vowel cannot be put in the add element in certain kinds of
encoding.  So, unless I’m missing something, you would have to change
the markup to accomodate the encoding you choose: the XML would have to
change depending on whether you want to see this in an encoding that has
the implicit vowel “a” or not.  And also it’s unclear to me what the
content of the add element should be if the script has implicit vowels.
(Perhaps one will also have to fiddle with the virāma, but that usually
works out.)

One might say that this is a good reason against using this type of
encoding (not Latin-based) for analytical markup/transcriptions in the
first place.  But I have at least one case where I can’t split the vowel
signs up at all, regardless of encoding.  And this was the second reason
for me to treat conjuncts as units: in an early Bengali script, there
was a change from “o” to “ā”, by deleting the left, preceding vertical
stroke of the “o”’s sign, similar to this:

কো -> কা

I don’t see how one could describe this in any transcription scheme,
since it would mean analyzing the “o” into two components (even in the
Bengali Unicode block, the “o” vowel sign is just one point).  I saw no
way around this apart from a graphical analysis of the problem.

So I decided to just encode changes from one whole conjunct to another,
and link this up with an analysis of the type of correction/change that
was employed.  I also, like Peter, link these things to pictures when
possible so that it’s easier to see what’s going on in each individual
case.

This doesn’t solve the second problem Andrew mentioned, the “flying r”
preceded by a stringhole.  I’ve never had to encode this kind of thing,
where the phonological and graphical characteristics are inverse, and so
can’t say much about it.  Peter’s solution seems useful to me (adding
sequence attributes to make the situation clearer).  Theoretically, one
could also introduce a special character to transliterate the “flying
r”, something like this (spaces added):

mā gg<gap/>Xa ṁ  (or should it be “mā gg<gap/>aX ṁ”?)

where “X” is the “flying r”.  The drawback would of course be that the
rendering issues would be pretty hard to solve: you’d need to transform
all “X” into an “r” preceding the last cluster of consonants, plus you
would then not be able to represent the stringhole in its proper place
anymore.  But this seems like much more of a bother than dealing with
@part attributes.

I’d also be happy to hear other solutions to these two problems!

Best wihes,

--
Patrick McAllister