Dear list members, I’m currently thinking about the encoding of sections (mainly tei:div elements) for the documents in the SARIT collection. Many of those documents are outside my field of expertise, and so I’m having a hard time with this. We have quite a large variety of section @types that either directly use Sanskrit terminology or use terminology that might (IMO) be based on it. They are: adhikaraṇa adhikāra adhyāya canto closing commentary commentary1 commentary2 conclusion kārika kārikā pariccheda prakāśa pāda samuddeśa sutra sūtra sūtra_with_bhāṣya It’s obviously a bit of a mess. I was wondering if anyone has a principled approach to these naming schemes in general? I at least have only very limited knowledge of the traditional sectioning of texts in Sanskrit literature as a whole, and am not sure what the aim should be here. Currently the SARIT guidelines (https://github.com/sarit/SARIT-corpus/blob/a355567b5bc61032e1d7fa39392f5e06d...) are rather cautious about the usage of these terms: ###### Sections of the text ════════════════════ There are many Sanskrit words for sections of a text: sarga, adhyāya, aṅka, pariccheda, ucchvāsa, etc. The encoding of these sections should meet several requirements: • the XML document itself should be valid; • the structure of the XML document reflects the logical structure of the text; • a standard reference system should be able to use the structure of the XML document as a proxy for the structure of the text; • texts in the corpus are broadly consistent in the encoding strategy used for these sections. These considerations lead us to recommend the use of div for all "parts," "sections," and "divisions" in the text, whatever their Sanskrit name is, and at whatever depth they occur. Do /not/ use the numbered divisions available in earlier versions of the TEI Guidelines (`<div1 xmlns="http://www.tei-c.org/ns/Examples"/>', `<div2 xmlns="http://www.tei-c.org/ns/Examples"/>', etc.). Before encoding a text, you should figure out a strategy for representing all of the relevant levels of the text as div elements. Some Mīmāṃsā texts, for example, are organized according to the hierarchical organization of the Mīmāṃsā Sūtras into adhyāyas, pādas, and adhikaraṇas. The first div beneath the body element (body/div) will thus correspond to an adhyāya, the first div below this to a pāda (body/div/div), and the first div below this (body/div/div/div) to an adhikaraṇa. If this schema is applied consistently, there is no need for assigning a type to the div elements themselves (e.g., `<div xmlns="http://www.tei-c.org/ns/Examples" type="adhyāya"/>', `<div xmlns="http://www.tei-c.org/ns/Examples" type="pāda"/>', `<div xmlns="http://www.tei-c.org/ns/Examples" type="adhikaraṇa"/>'), but these type attributes may be included in order to make the XML easier to read. The strategy followed in the text can be, and should be, documented in the reference declaration (refsDecl), which is part of the encoding description (encodingDesc) in the TEI Header (see above). 1 Labelling sections ──────────────────── Sections of the text should be /identifiable/. In order to be identifiable to humans, sections generally carry a heading and/or a trailer (see below). In order to be identifiable to machines, sections carry a numeric /label/ that is represented by the n attribute of the corresponding div element. The value of n will usually be the serial number of the section: `<div xmlns="http://www.tei-c.org/ns/Examples" n="2"/>' represents the second division (adhyāya, pāda, adhikaraṇa, etc.), even if it's not actually the second div element within its parent element. For division that comprises more than one such section, we simply put all of the corresponding numbers in the n attribute: `<div xmlns="http://www.tei-c.org/ns/Examples" n="2 3 4"/>'. Divisions (divs) are block-level elements, and they constitute the text hierarchy, together with other block-level elements like paragraphs (p), verses (lg), and the “anonymous blocks” used for sūtras and the like (ab). The overall reference system of the text will therefore usually include the numbering of div elements at the upper levels and the numbering of lg or ab elements at the lower levels. ###### I can at least say that for most practical purposes the @type on the div is ignored, at the moment: the level of any given div is inferred from the number of parent div-s, and the typesetting/display on the various platforms of SARIT depends on several factors about a div’s content rather than on the @type. But my question is more theoretical: should we generally encourage the use of Sanskrit technical terms for types of a text’s division or not? And if you think we should encourage them, would it make sense (or even be possible) to somehow specify guidelines for their usage that would apply to many texts? In SARIT at least, there are some cases where we decided against the use of Sanskrit terms, mainly because their meaning is actually not so clear, e.g., in the markup of tei:quote with @type="lemma" instead of something like @type="pratīka". Hoping for some help, -- Patrick McAllister long-term email: pma@rdorte.org