Dear colleagues,

I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse:

<l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/>

ndān susvasusvani kinolnya hanan liniṅliṅ</l>

<l>rakryan vədinta tan akun ləvu paṅhavista<caesura/>

heman kitābapa niragraha māsku liṅnya</l>

</lg>

This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows:

<l>vvantən mañumbana puḍak ginuritnya pārtha</l>

<l>ndān susvasusvani kinolnya hanan liniṅliṅ</l>

<l>rakryan vədinta tan akun ləvu paṅhavista</l>
<l>heman kitābapa niragraha māsku liṅnya</l>

</lg>

My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space.

But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely.

Andrew