Re: [Indic-texts] encoding pratīkas in commentary

24 Sep 2018

      ...
...
Le 23 sept. 2018 à 10:46, Arlo Griffiths <arlo.griffiths@efeo.net> a écrit :
Dear colleagues,
I am trying to encode a text from Bali built up around Sanskrit stanzas, which are followed by Old Javanese glosses. The glosses themselves are interspersed with Sanskrit elements, generally (but not always) chunks from the root text. Here’s an example from e-text that’s going to be TEI encoded.
<KMN21s-ab> na te 'tra vimatiḥ kāryā nirviśaṅkena cetasā
<KMN21s-cd> prakāśaya mahātulaṁ mantracaryānayam param ||21||
  c. Speijer notes that the reading mahātulaṁ is unmetrical.
<KMN21j> ka: hayva kita vicikitsa, NIRVIŚAṄKENA CETASĀ, ikaṅ nissandeha atah ambĕka[ka]nta, PRAKĀŚAYA MAHĀTULAṀ [msA-a17] MANTRACĀRYYANAYAM PARAṀ, at pintonakna ike, saṅ hyaṅ mantranaya mahāyāna.
How would you propose to encode a chunk like NIRVIŚAṄKENA CETASĀ? Options I have thought of (with Andrew Ollett) are:
(1) <quote type="pratīka">nirviśaṅkena cetasā<quote>
(2) <term>nirviśaṅkena cetasā<term>
In the second solution, I could also add <gloss> to the string ikaṅ nissandeha atah ambĕka[ka]nta. But I am hoping that this is not mandatory. Neither of the two solutions immediately helps if I wanted to have a mechnism that links the pratīka with the corresponding words in the mūla.
I’d suggest something like this:

<div>
na te 'tra vimatiḥ kāryā <seg xml:id="KMN21s-b-end">nirviśaṅkena cetasā</seg>

<mentioned sameAs="#KMN21s-b-end">nirviśaṅkena cetasā</mentioned>
</div>

(This would assume there is no construction corresponding to Skt /iti/
around the phrase.  If there were, tei:quote would be better than
tei:mentioned.)

On Sun, Sep 23 2018, Andrea Acri wrote:
...
Dear Arlo
for the DhPāt, after consultation with Andrew during the TEI workshop in Paris, I have used the following commands:
<lg xml:lang="san-Latn" n="1" met="anuṣṭubh">
<l>acintyo niṣkalaḥ śāntaḥ,</l>
      <l>dhruva-m abyaya-m īśvaraḥ,<\l>
      <l>asau sūkṣmaḥ paraḥ śāntaḥ,</l>
         <l>śivaḥ sakalaniṣkalaḥ ◆</l>
            </lg>
         <p>apan sira sinaṅguh <quote type="lemma">acintya</quote>, apa tar kavәnaṅ inaṅәnaṅәn, <quote type="lemma">niṣkala</quote> sira tar pā-
vak, tan pavarṇa, ta(1v)n baṅ, tan aputih, tar kuniṅ, tan hirәṅ, kapila dvivarṇādi,
tan hana ikā kabeh ri saṅ hyaṅ paramārtha,</p>
<quote type="pratīka"’> should be fine, but will it not be unintelligible to non-Indologists?
The solution with ~tei:quote~ and ~@type="lemma"~ is what we use in some
of the current SARIT texts.  I think the main reason we used it is that
"pratīka" is not only unclear to non-Indologists, but probably not the
correct term in many cases (it’s a tricky issue, see below for my
current idea).  This solution was fine for our problem there: encoding
existing editions, which would usually mark all actual pratīka-s and all
other types of quotes from the base-text in the same way.  For the
conversion from printed to digital text, it would have introduced a new
source for errors if we had differentiated these things further.

But for a new edition that is being created digitally, a more
differentiated kind of mark-up might be useful: you could then easily
query your document for the different types of “quotes” that
commentaries typically use.

At the TEI conference in Tokyo 2 weeks ago, I presented a few cases of
such quotations.  Please take into account that this was meant for an
audience of non-Indologists, hoping to let them understand the different
types of quotes that one commonly finds in Sanskrit commentaries.  The
following is from my presentation notes, typos included (slides are
currently here: http://rdorte.org/pma/tei2018.html):

<<< BEGIN QUOTE >>>

1.2 Types of quotations
───────────────────────

1.2.1 Simple quotes (and their context)
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  The first example is the simplest, and there can be no doubt that this
  is a quote in the fullest sense:

  ┌────
  │ <div xmlns="http://www.tei-c.org/ns/1.0">
  │   <p><gap/>etena yad uktam <persName role="opponent">udyotakareṇa</persName> <cit><quote>avācakatve śabdānāṃ pratijñāhetvorvyāghāta</quote></cit> iti tadapi pratyuktaṃ bhavati.<gap/></p>
  │
  │   <p>Through this [argument in the base text] also what <persName>Uddyotakara</persName> said has
  │   been rejected, namely, <cit><quote>If words do not denote anything, both
  │   your proposition and your reason are inconsistent.</quote></cit>.</p>
  │ </div>
  └────

  This example is from the ca. eighth century Tattvasaṃgrahapañjikā
  ([TSP1]), the commentary by Kamalaśīla on the extensive work in verse
  by Śāntarakṣita.

  There are three voices that must be distinguished in this passage:

  1) First, it is spoken by Kamalaśīla, the commentator.
  2) Second, he is connecting an argument from the base text, the text
     he is commenting upon, to an objection by a non-Buddhist opponent.
     The base text is here not quoted but only pointed at, by saying
     “Through this [argument in the base text]”.
  3) Third, this opponent’s text is quoted verbatim, as a sub-phrase in
     a statement that rejects it.  In this case, we can actually verify
     that it is a faithful quote, since the sentence is found in
     Uddyotakara’s work (\cite[312.21--22]{uddyotakara97:_NV}).

  The commentary here performs various functions:

  1) /etena … bhavati/: connects the text commented upon and the
     provided quote
  2) /yad uktam/: introduces quote (lit., “said”)
  3) /udyotakareṇa/: original speaker of what was said
  4) `<quote />': what was said
  5) Its embedding in a `cit' element, here abused because there is no
     explicit reference to the source, unless we include the name of the
     speaker, makes it clear that it is not an imaginary quote (or
     attribution), but a claim actually upheld by someone.

  All of these functions can be tied to parts or segments of the
  sentences under consideration.  As we shall see, it is desirable that
  all these functions that the commentary performs should, ideally, be
  available for general queries run on the group of texts.

1.2.2 Quotes as references
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  Later in the same text, Kamalaśīla introduces verse 1061 of the text
  he is commenting on like this, \cite[ad 1061,1062]{TSP1}:

  ┌────
  │ <div xmlns="http://www.tei-c.org/ns/1.0-pause-valid" n="1061">
  │     <lg xml:id="ts__1061">
  │       <l><seg xml:id="ts__1061__a">agobhinnaṃ</seg>
  │       <seg xml:id="ts__1061__b">ca</seg>
  │       yadvastu tadakṣairvyavasīyate /</l>
  │       <l>pratibimbaṃ tadadhyastaṃ svasaṃvittyā'vagamyate // 1061 //</l>
  │     </lg>
  │
  │
  │   <p>yac coktam <hi rend="bold">indriyair</hi> ityādi,
  │   tad asiddham iti darśayann āha—<hi rend="bold">agobhinnaṃ
  │   ce</hi>tyādi.  <hi rend="bold">ca</hi>kāro 'nuktārthasamuccaye.
  │   </p>
  │
  │   <p xml:lang="en">Further, what was said with the words <quote>“By
  │   sense perceptions”</quote> and so on, that is unestablished.  In
  │   order to show this, Śāntarakṣita said <quote>“And a thing
  │   differentiated from non-cow”</quote>. ... The uninflected word
  │   <quote>“and”</quote> is used in order to include the meanings of
  │   exclusion not mentioned in the verse.</p>
  │ </div>
  └────
  Listing 1: Quotes as references

  This passage shows some other types of quotes than we saw before.  The
  commentator’s introduction of the passage from the base text, the text
  the commentary is written on, contains two elements (here still marked
  graphically by `hi') that should be categorized as `quote' elements of
  some sort:

  1) The first is “indriyair ityādi”.  This refers back to verse 939,
     which itself is a quote, in the base text, of a passage by an
     opponent.
  2) The second is “agobhinnaṃ cetyādi”, which points to the beginning
     of the verse in which the base text answers the opponent’s claim
     made in verse 939.

  Both of these quotations are what is properly called a “pratīka”,
  literally, “that which turns towards something”, and signifying “the
  front of something.”  In this context, it refers to the beginning of
  the passage or verse that the commentator is referring to.  What is
  essential about this type of quotation is that its content is usually
  irrelevant: it is a reference to a particular string of characters, or
  sequence of sounds.

  This is supported also by the liberties that the Tibetan translators
  of such texts took with these markers: they would not literally
  translate the content of the reference, but insert whatever words
  ended up at the beginning of the Tibetan translation of the passage
  that is referred to.

  To make this function explicit, one could encode the text in several
  ways.  Here I will discuss solutions that I have at some point
  considered useful and then revised with further experience.  These
  attempts are therefore probably not ideal.

  ┌────
  │ <div xmlns="http://www.tei-c.org/ns/1.0-pause-valid" n="1061">
  │       <lg xml:id="ts__1061">
  │         <l><seg xml:id="ts__1061__a">agobhinnaṃ</seg>
  │         <seg xml:id="ts__1061__b">ca</seg>
  │         yadvastu tadakṣairvyavasīyate /</l>
  │         <l>pratibimbaṃ tadadhyastaṃ svasaṃvittyā'vagamyate // 1061 //</l>
  │       </lg>
  │
  │     <p ana="#alt1">yac coktam <quote type="lemma"
  │     corresp="#ts__939">indriyair</quote> ityādi, tad asiddham iti
  │     darśayann āha—<quote type="lemma"
  │     corresp="#ts__1061__a">agobhinnaṃ ce</quote>tyādi.</p>
  │ </div>
  └────
  Listing 2: Quotes as references, first attempt

  The first solution is to use the `quote' tag for these, with `@type'
  set to `lemma' (in the sense used by text criticism, not to be
  confused with the linguistic value as in the `@lemma' attribute).
  With this markup we can separate the `quote[@type="gloss"]' and other
  variants quite well. The drawback is a possible abuse of the
  `@corresp' attribute, which does not seem to contain the semantics of
  `@target'.

  The next suggested solution uses `ref' elements.

  ┌────
  │ <div xmlns="http://www.tei-c.org/ns/1.0-pause-valid" n="1061">
  │       <lg xml:id="ts__1061">
  │         <l><seg xml:id="ts__1061__a">agobhinnaṃ</seg>
  │         <seg xml:id="ts__1061__b">ca</seg>
  │         yadvastu tadakṣairvyavasīyate /</l>
  │         <l>pratibimbaṃ tadadhyastaṃ svasaṃvittyā'vagamyate // 1061 //</l>
  │       </lg>
  │
  │     <p ana="#alt2">yac coktam <ref target="#ts__939"
  │     type="lemma">indriyair</ref> ityādi, tad asiddham iti darśayann
  │     āha—<ref target="#ts__1061__a" type="lemma">agobhinnaṃ
  │     ce</ref>tyādi.</p>
  │ </div>
  └────
  Listing 3: Quotes as references, second attempt

  However, this creates problems.  Semantically it is problematic,
  because what refers in this case is not the content of the quote.  It
  also has the significant drawback that any query intended to catch all
  quote-like elements will now have to include certain `ref' elements.
  This would increase the complexity of queries quite significantly.

  One could however combine these solutions, and treat the quote *as a
  whole* as a referring string:

  ┌────
  │ <div xmlns="http://www.tei-c.org/ns/1.0-pause-valid" n="1061">
  │       <lg xml:id="ts__1061">
  │         <l><seg xml:id="ts__1061__a">agobhinnaṃ</seg>
  │         <seg xml:id="ts__1061__b">ca</seg>
  │         yadvastu tadakṣairvyavasīyate /</l>
  │         <l>pratibimbaṃ tadadhyastaṃ svasaṃvittyā'vagamyate // 1061 //</l>
  │       </lg>
  │
  │     <p ana="#alt2">yac coktam <ref target="#ts__939"><quote
  │     type="lemma">indriyair</quote> ityādi</ref>, tad asiddham iti
  │     darśayann āha—<ref target="#ts__1061__a"><quote
  │     type="lemma">agobhinnaṃ ce</quote>tyādi</ref>.</p>
  │ </div>
  └────
  Listing 4: Quotes as references, proposed solution

  This seems useful: we could pick out the `quote' elements along with
  all others, and still easily differentiate them by examining whether
  they are embedded in a reference to the base-text.

1.2.3 Quotes of individual words/phrases for elucidation
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌

  In the second example sentence that follows later in the commentary,
  Kamalaśīla takes up one word from the verse, the word “and”,
  Skt. /ca/, and says for which purpose it was used in the verse.  This
  is not the same kind of referring quotation that we saw above, i.e.,
  the purpose is to comment on the significance or meaning of the term,
  and not to indicate a particular point in the base text.

  ┌────
  │ <div xmlns="http://www.tei-c.org/ns/1.0" n="1061">
  │       <lg xml:id="ts__1061">
  │         <l><seg xml:id="ts__1061__a">agobhinnaṃ</seg>
  │         <seg xml:id="ts__1061__b">ca</seg>
  │         yadvastu tadakṣairvyavasīyate /</l>
  │         <l>pratibimbaṃ tadadhyastaṃ svasaṃvittyā'vagamyate // 1061 //</l>
  │       </lg>
  │
  │     <p ana="#alt2"><mentioned sameAs="#ts__1061__b">ca</mentioned>kāro
  │     <gloss target="#ts__1061__b">'nuktārthasamuccaye</gloss>.
  │     </p>
  │ </div>
  └────
  Listing 5: Quotes for explanation

  To a certain extent, this has similar problems as the simple `ref'
  solution just discussed, in terms of query complexity: any query for
  quotes in general will have to take the variation introduced by
  `mentioned' into account, and become more complicated because of that.
  However, `mentioned' is a `quoteLike' element, and as such should
  actually be considered in well-constructed queries for quotes.

  The primary function of repeating the word that is to be explained,
  /ca/, is not to refer to the text, but to say something about the
  meaning or content of that term.  Semantically, this solution seems to
  fit quite well.

  Both elements, `mentioned' and `gloss', are, in any case, here loosely
  tied to the base text, and not to each other, because there are
  variant forms where there is either no clearly identifiable `gloss' or
  the term which could be `mentioned' is not repeated in the text.

<<< END QUOTE >>>

The upshot of the whole thing is that it would be easy when editing the
text to differentiate these basic types of
quotes/mentionings/references.  And (for my work at least) it would be
helpful if one could filter the different kinds of quotes, e.g., for
cross-checking the edition of the verses, or for discovering patterns in
the commentator’s quotation habits.

I’d be happy to have some feedback on these suggestions, especially on
whether @corresp or @sameAs seems better for linking.  I use both above,
because I couldn’t make up my mind, but currently think that @sameAs is
more appropriate: unlike @corresp, it can only contain one target, and
the explanations that you want to link would always be keyed to exactly
one word/phrase of the base-text.

With best wishes,

--
Patrick McAllister
long-term email: pma@rdorte.org