[Indic-texts] Sanskrit names for division types

10 Oct 2018

      Dear list members,

I’m currently thinking about the encoding of sections (mainly tei:div
elements) for the documents in the SARIT collection.  Many of those
documents are outside my field of expertise, and so I’m having a hard
time with this.

We have quite a large variety of section @types that either directly use
Sanskrit terminology or use terminology that might (IMO) be based on it.
They are:

adhikaraṇa
adhikāra
adhyāya
canto
closing
commentary
commentary1
commentary2
conclusion
kārika
kārikā
pariccheda
prakāśa
pāda
samuddeśa
sutra
sūtra
sūtra_with_bhāṣya

It’s obviously a bit of a mess.  I was wondering if anyone has a
principled approach to these naming schemes in general?  I at least have
only very limited knowledge of the traditional sectioning of texts in
Sanskrit literature as a whole, and am not sure what the aim should be
here.

Currently the SARIT guidelines
(https://github.com/sarit/SARIT-corpus/blob/a355567b5bc61032e1d7fa39392f5e06d...)
are rather cautious about the usage of these terms:

######

Sections of the text
════════════════════

  There are many Sanskrit words for sections of a text: sarga, adhyāya,
  aṅka, pariccheda, ucchvāsa, etc. The encoding of these sections should
  meet several requirements:

  • the XML document itself should be valid;

  • the structure of the XML document reflects the logical structure of
    the text;

  • a standard reference system should be able to use the structure of
    the XML document as a proxy for the structure of the text;

  • texts in the corpus are broadly consistent in the encoding strategy
    used for these sections.

  These considerations lead us to recommend the use of div for all
  "parts," "sections," and "divisions" in the text, whatever their
  Sanskrit name is, and at whatever depth they occur. Do /not/ use the
  numbered divisions available in earlier versions of the TEI Guidelines
  (`<div1 xmlns="http://www.tei-c.org/ns/Examples"/>', `<div2
  xmlns="http://www.tei-c.org/ns/Examples"/>', etc.).

  Before encoding a text, you should figure out a strategy for
  representing all of the relevant levels of the text as div
  elements. Some Mīmāṃsā texts, for example, are organized according to
  the hierarchical organization of the Mīmāṃsā Sūtras into adhyāyas,
  pādas, and adhikaraṇas. The first div beneath the body element
  (body/div) will thus correspond to an adhyāya, the first div below
  this to a pāda (body/div/div), and the first div below this
  (body/div/div/div) to an adhikaraṇa. If this schema is applied
  consistently, there is no need for assigning a type to the div
  elements themselves (e.g., `<div
  xmlns="http://www.tei-c.org/ns/Examples" type="adhyāya"/>', `<div
  xmlns="http://www.tei-c.org/ns/Examples" type="pāda"/>', `<div
  xmlns="http://www.tei-c.org/ns/Examples" type="adhikaraṇa"/>'), but
  these type attributes may be included in order to make the XML easier
  to read.

  The strategy followed in the text can be, and should be, documented in
  the reference declaration (refsDecl), which is part of the encoding
  description (encodingDesc) in the TEI Header (see above).

1 Labelling sections
────────────────────

  Sections of the text should be /identifiable/. In order to be
  identifiable to humans, sections generally carry a heading and/or a
  trailer (see below). In order to be identifiable to machines, sections
  carry a numeric /label/ that is represented by the n attribute of the
  corresponding div element. The value of n will usually be the serial
  number of the section: `<div xmlns="http://www.tei-c.org/ns/Examples"
  n="2"/>' represents the second division (adhyāya, pāda, adhikaraṇa,
  etc.), even if it's not actually the second div element within its
  parent element. For division that comprises more than one such
  section, we simply put all of the corresponding numbers in the n
  attribute: `<div xmlns="http://www.tei-c.org/ns/Examples" n="2 3
  4"/>'.

  Divisions (divs) are block-level elements, and they constitute the
  text hierarchy, together with other block-level elements like
  paragraphs (p), verses (lg), and the “anonymous blocks” used for
  sūtras and the like (ab). The overall reference system of the text
  will therefore usually include the numbering of div elements at the
  upper levels and the numbering of lg or ab elements at the lower
  levels.

######

I can at least say that for most practical purposes the @type on the div
is ignored, at the moment: the level of any given div is inferred from
the number of parent div-s, and the typesetting/display on the various
platforms of SARIT depends on several factors about a div’s content
rather than on the @type.

But my question is more theoretical: should we generally encourage the
use of Sanskrit technical terms for types of a text’s division or not?
And if you think we should encourage them, would it make sense (or even
be possible) to somehow specify guidelines for their usage that would
apply to many texts?

In SARIT at least, there are some cases where we decided against the use
of Sanskrit terms, mainly because their meaning is actually not so
clear, e.g., in the markup of tei:quote with @type="lemma" instead of
something like @type="pratīka".

Hoping for some help,

--
Patrick McAllister
long-term email: pma@rdorte.org