
Dear colleagues, I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse: <lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/> ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista<caesura/> heman kitābapa niragraha māsku liṅnya</l> </lg> This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows: <lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg> My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space. But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely. Andrew

I use the seg-element to mark pAdas <lg type="upendravajrA/indravajrA/indravajrA/upendravajrA" ana="upajAti" met="jtjgg/ttjgg/ttjgg/jtjgg" n="93"> <l> <seg type='foot'>anantarodIritalakzmaBAjO</seg> <seg type='foot'>pAdO yadIyAv-upajAtayas-tAH</seg> </l> <l> <seg type='foot'>itTaM kilAnyAsv-api miSritAsu</seg> <seg type='foot'>vadanti2 jAtizv-idam-eva nAma ..93..</seg> </l> </lg> I use the space element if there is a word break between pAdas in anuzwuB meter (where there often is not), though I see no harm in using the caesura element there. There are however often caesurae even where there is no pAda break, so if consistency is the issue, the seg element seems to me to be the best solution. Yours, Peter ****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org http://sanskritlibrary.org ******************************
On 7 Jun. 2018, at 9:26 AM, Balogh Dániel <danbalogh@gmail.com> wrote:
Dear Andrew and everyone else,
the solution I've come up with for Siddham is to mark up pādas as <l>, and to nest <lg> elements for half-verses, thus for all varṇavṛtta metres: <lg n="1" met="pṛthvī"> <lg type="halfverse" n="ab"> <l>pradāna-bhuja-vikkrama-praśama-śāstra-vākyodayair</l> <l>uparyyupari-sañcayocchritam aneka-mārggaṃ yaśaḥ</l> </lg> <lg type="halfverse" n="cd"> <l>punāti bhuvana-trayaṃ paśupater jjaṭāntar-guhā-</l> <l>nirodha-parimokṣa-śīghram iva pāṇḍu gāṅgaṃ payaḥ</l> </lg> </lg>
I feel this method is fully compliant with TEI and it paves the way for typography, giving you the choice to print or not to print a line break at the end of odd pādas, and to add automated | and || punctuation if desired. The shortcomings that I am aware of are 1) this works best with transliteration (e.g. pāda a would have to end at yai, and ru would have to move to pāda b in an alphasyllabary); and 2) it may interfere with lexical tagging, e.g. if you wanted to wrap all words including compounds in <w>, then the compound spanning b to c is problematic. As I see things, problem 1 is universal, not restricted to this scheme; those who encode texts in Devanagari or another Indic script just have to do some things differently, and automated conversion between scripts remainst tricky with markup. As for problem 2, it can still be handled in TEI if necessary; I am not tagging words in my corpus so I have not looked into linking elements together. In a language where lexical units stretch across pāda boundaries a lot of the time, it may be inconvenient to keep doing this, but it still looks like best practice to me.
Now in metres of the āryā family I mark up only two <l> elements, and each of those is alwo wrapped in <lg>, like this:
<lg n="42" met="āryā"> <lg type="halfverse" n="ab"> <l>śaśineva nabho vimalaṃ kaustubha-maṇineva śārṇgiṇo vakṣaḥ|</l> </lg> <lg type="halfverse" n="cd"> <l>bhavana-vareṇa tathedaṃ puram akhilam alaṃkṛtam udāraṃ||</l> </lg> </lg>
This leaves the caesura out, which is my choice. I have likewise chosen not to tag the caesura in varṇavṛtta metres, and I feel that the caesura in āryā is more akin to the caesura within a pāda of a catuṣpadī than to the yati at the end of an odd pāda of a catuṣpadī. This is subjective and one could argue differently. If I did want to mark up caesurae then I would use the <caesura> element in both āryā and within pādas of varṇavṛttas for that purpose. This seems to be much easier to work with than using <seg> elements for every colon.
All best, Dan
On 2018. 06. 07. 15:17, Andrew Ollett wrote:
Dear colleagues,
I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/> ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista<caesura/> heman kitābapa niragraha māsku liṅnya</l> </lg>
This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg>
My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space.
But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely.
Andrew
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org <mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts>
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts

Dear list members, I've found the alternative solution proposed by Andrew to be serve most of my purposes (one important one being ease of use):
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg>
All else being equal (problems with word boundaries across markup etc.), I find it easier to think of all pādas as being on the same level and of the same type: that’s why I prefer tei:l elements for them. Introducing tei:seg elements or additional (nested) tei:lg seems to complicate things, and I’m not sure which problems they are supposed to solve. If I know that a verse is supposed to consist of, say, 4 pādas (by looking at the @met attribute), I’d expect to just have those 4 items on the same level, and immediately under the tei:lg. All typesetting problems are easily solved by looking at the @met attribute, and perhaps also considering the number of tei:l elements immediately under a tei:lg. In SARIT, as Andrew has pointed out, the encoding is usually 2 pādas per tei:l (but not always if I remember correctly). It all depends, of course, on the purpose of the text being encoded (I haven’t done any automated metrical analysis, for example, but imagine I’d strip out most of the markup first in any case and wouldn’t want more sophisticated markup). Best wishes, On Thu, Jun 07 2018, Peter Scharf wrote:
I use the seg-element to mark pAdas
<lg type="upendravajrA/indravajrA/indravajrA/upendravajrA" ana="upajAti" met="jtjgg/ttjgg/ttjgg/jtjgg" n="93"> <l> <seg type='foot'>anantarodIritalakzmaBAjO</seg> <seg type='foot'>pAdO yadIyAv-upajAtayas-tAH</seg> </l> <l> <seg type='foot'>itTaM kilAnyAsv-api miSritAsu</seg> <seg type='foot'>vadanti2 jAtizv-idam-eva nAma ..93..</seg> </l> </lg>
I use the space element if there is a word break between pAdas in anuzwuB meter (where there often is not), though I see no harm in using the caesura element there. There are however often caesurae even where there is no pAda break, so if consistency is the issue, the seg element seems to me to be the best solution.
Yours, Peter
****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org http://sanskritlibrary.org ******************************
On 7 Jun. 2018, at 9:26 AM, Balogh Dániel <danbalogh@gmail.com> wrote:
Dear Andrew and everyone else,
the solution I've come up with for Siddham is to mark up pādas as <l>, and to nest <lg> elements for half-verses, thus for all varṇavṛtta metres: <lg n="1" met="pṛthvī"> <lg type="halfverse" n="ab"> <l>pradāna-bhuja-vikkrama-praśama-śāstra-vākyodayair</l> <l>uparyyupari-sañcayocchritam aneka-mārggaṃ yaśaḥ</l> </lg> <lg type="halfverse" n="cd"> <l>punāti bhuvana-trayaṃ paśupater jjaṭāntar-guhā-</l> <l>nirodha-parimokṣa-śīghram iva pāṇḍu gāṅgaṃ payaḥ</l> </lg> </lg>
I feel this method is fully compliant with TEI and it paves the way for typography, giving you the choice to print or not to print a line break at the end of odd pādas, and to add automated | and || punctuation if desired. The shortcomings that I am aware of are 1) this works best with transliteration (e.g. pāda a would have to end at yai, and ru would have to move to pāda b in an alphasyllabary); and 2) it may interfere with lexical tagging, e.g. if you wanted to wrap all words including compounds in <w>, then the compound spanning b to c is problematic. As I see things, problem 1 is universal, not restricted to this scheme; those who encode texts in Devanagari or another Indic script just have to do some things differently, and automated conversion between scripts remainst tricky with markup. As for problem 2, it can still be handled in TEI if necessary; I am not tagging words in my corpus so I have not looked into linking elements together. In a language where lexical units stretch across pāda boundaries a lot of the time, it may be inconvenient to keep doing this, but it still looks like best practice to me.
Now in metres of the āryā family I mark up only two <l> elements, and each of those is alwo wrapped in <lg>, like this:
<lg n="42" met="āryā"> <lg type="halfverse" n="ab"> <l>śaśineva nabho vimalaṃ kaustubha-maṇineva śārṇgiṇo vakṣaḥ|</l> </lg> <lg type="halfverse" n="cd"> <l>bhavana-vareṇa tathedaṃ puram akhilam alaṃkṛtam udāraṃ||</l> </lg> </lg>
This leaves the caesura out, which is my choice. I have likewise chosen not to tag the caesura in varṇavṛtta metres, and I feel that the caesura in āryā is more akin to the caesura within a pāda of a catuṣpadī than to the yati at the end of an odd pāda of a catuṣpadī. This is subjective and one could argue differently. If I did want to mark up caesurae then I would use the <caesura> element in both āryā and within pādas of varṇavṛttas for that purpose. This seems to be much easier to work with than using <seg> elements for every colon.
All best, Dan
On 2018. 06. 07. 15:17, Andrew Ollett wrote:
Dear colleagues,
I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/> ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista<caesura/> heman kitābapa niragraha māsku liṅnya</l> </lg>
This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg>
My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space.
But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely.
Andrew
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org <mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts>
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
-- Patrick McAllister Email: patrick.mcallister@oeaw.ac.at Phone: + 43 1 51581 6423 Institute for the Cultural and Intellectual History of Asia (IKGA) Austrian Academy of Sciences Hollandstraße 11-13, 2nd floor 1020 Vienna, Austria http://www.ikga.oeaw.ac.at/

Dear all, I am in agreement with Andrew and Patrick. The system propopse by Andrew is the one we have implemented on EIAD, where most texts have been tagged at the <w> level. See e.g. the first stanza from EIAD 186: <lg met="mālinī" n="I"> <l n="a"> <lb n="1"/> <g type="dextrorotatory-spiral"/> <w>jayati</w> <w>munir</w> <w part="I">udagrakhyātacandrāṁśujāla</w> </l> <l n="b"> <w part="F">pracayarucirakī<lb n="2" break="no"/>rttiśrīr</w> <w>ajeyasya</w> <w>yasya</w> </l> <l n="c"> <w>jagad</w> <w>idam</w> <w>abhiṣiktan</w> <w>dakṣiṇāṁmbhobhir</w> <w part="I">u<lb n="3" break="no"/>ccaiḥ</w> </l> <l n="d"> <w part="F">kṣubhitasalilanāthasparddhibhir</w> <w>mmārasainyaiḥ</w> <pc>||</pc> </l> </lg> See <http://hisoma.huma-num.fr/exist/apps/EIAD/works/EIAD0186.xml?&odd=teipublisher.odd>. In our system, the layout is indeed based upon a test for the value of @met. Best wishes, Arlo Le 7 juin 2018 à 23:04, Patrick McAllister <patrick.mcallister@oeaw.ac.at<mailto:patrick.mcallister@oeaw.ac.at>> a écrit : Dear list members, I've found the alternative solution proposed by Andrew to be serve most of my purposes (one important one being ease of use): <lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg> All else being equal (problems with word boundaries across markup etc.), I find it easier to think of all pādas as being on the same level and of the same type: that’s why I prefer tei:l elements for them. Introducing tei:seg elements or additional (nested) tei:lg seems to complicate things, and I’m not sure which problems they are supposed to solve. If I know that a verse is supposed to consist of, say, 4 pādas (by looking at the @met attribute), I’d expect to just have those 4 items on the same level, and immediately under the tei:lg. All typesetting problems are easily solved by looking at the @met attribute, and perhaps also considering the number of tei:l elements immediately under a tei:lg. In SARIT, as Andrew has pointed out, the encoding is usually 2 pādas per tei:l (but not always if I remember correctly). It all depends, of course, on the purpose of the text being encoded (I haven’t done any automated metrical analysis, for example, but imagine I’d strip out most of the markup first in any case and wouldn’t want more sophisticated markup). Best wishes, On Thu, Jun 07 2018, Peter Scharf wrote: I use the seg-element to mark pAdas <lg type="upendravajrA/indravajrA/indravajrA/upendravajrA" ana="upajAti" met="jtjgg/ttjgg/ttjgg/jtjgg" n="93"> <l> <seg type='foot'>anantarodIritalakzmaBAjO</seg> <seg type='foot'>pAdO yadIyAv-upajAtayas-tAH</seg> </l> <l> <seg type='foot'>itTaM kilAnyAsv-api miSritAsu</seg> <seg type='foot'>vadanti2 jAtizv-idam-eva nAma ..93..</seg> </l> </lg> I use the space element if there is a word break between pAdas in anuzwuB meter (where there often is not), though I see no harm in using the caesura element there. There are however often caesurae even where there is no pAda break, so if consistency is the issue, the seg element seems to me to be the best solution. Yours, Peter ****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org<mailto:scharf@sanskritlibrary.org> http://sanskritlibrary.org ****************************** On 7 Jun. 2018, at 9:26 AM, Balogh Dániel <danbalogh@gmail.com> wrote: Dear Andrew and everyone else, the solution I've come up with for Siddham is to mark up pādas as <l>, and to nest <lg> elements for half-verses, thus for all varṇavṛtta metres: <lg n="1" met="pṛthvī"> <lg type="halfverse" n="ab"> <l>pradāna-bhuja-vikkrama-praśama-śāstra-vākyodayair</l> <l>uparyyupari-sañcayocchritam aneka-mārggaṃ yaśaḥ</l> </lg> <lg type="halfverse" n="cd"> <l>punāti bhuvana-trayaṃ paśupater jjaṭāntar-guhā-</l> <l>nirodha-parimokṣa-śīghram iva pāṇḍu gāṅgaṃ payaḥ</l> </lg> </lg> I feel this method is fully compliant with TEI and it paves the way for typography, giving you the choice to print or not to print a line break at the end of odd pādas, and to add automated | and || punctuation if desired. The shortcomings that I am aware of are 1) this works best with transliteration (e.g. pāda a would have to end at yai, and ru would have to move to pāda b in an alphasyllabary); and 2) it may interfere with lexical tagging, e.g. if you wanted to wrap all words including compounds in <w>, then the compound spanning b to c is problematic. As I see things, problem 1 is universal, not restricted to this scheme; those who encode texts in Devanagari or another Indic script just have to do some things differently, and automated conversion between scripts remainst tricky with markup. As for problem 2, it can still be handled in TEI if necessary; I am not tagging words in my corpus so I have not looked into linking elements together. In a language where lexical units stretch across pāda boundaries a lot of the time, it may be inconvenient to keep doing this, but it still looks like best practice to me. Now in metres of the āryā family I mark up only two <l> elements, and each of those is alwo wrapped in <lg>, like this: <lg n="42" met="āryā"> <lg type="halfverse" n="ab"> <l>śaśineva nabho vimalaṃ kaustubha-maṇineva śārṇgiṇo vakṣaḥ|</l> </lg> <lg type="halfverse" n="cd"> <l>bhavana-vareṇa tathedaṃ puram akhilam alaṃkṛtam udāraṃ||</l> </lg> </lg> This leaves the caesura out, which is my choice. I have likewise chosen not to tag the caesura in varṇavṛtta metres, and I feel that the caesura in āryā is more akin to the caesura within a pāda of a catuṣpadī than to the yati at the end of an odd pāda of a catuṣpadī. This is subjective and one could argue differently. If I did want to mark up caesurae then I would use the <caesura> element in both āryā and within pādas of varṇavṛttas for that purpose. This seems to be much easier to work with than using <seg> elements for every colon. All best, Dan On 2018. 06. 07. 15:17, Andrew Ollett wrote: Dear colleagues, I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse: <lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/> ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista<caesura/> heman kitābapa niragraha māsku liṅnya</l> </lg> This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows: <lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg> My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space. But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely. Andrew _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org <mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts> _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts -- Patrick McAllister Email: patrick.mcallister@oeaw.ac.at<mailto:patrick.mcallister@oeaw.ac.at> Phone: + 43 1 51581 6423 Institute for the Cultural and Intellectual History of Asia (IKGA) Austrian Academy of Sciences Hollandstraße 11-13, 2nd floor 1020 Vienna, Austria http://www.ikga.oeaw.ac.at/ _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts

If one tags words within a verse one will have to disolve sandh which destroys the meter and makes the text no longer metrical, i.e. no longer a verse. For this reason, I do word tagging in a separate file from the file that has sandhi of the verse undisturbed and coordinate it with the verse file by identical xml:id elements. Yours, Peter ****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org http://sanskritlibrary.org ******************************
On 7 Jun. 2018, at 11:38 PM, Arlo Griffiths <arlo.griffiths@efeo.net> wrote:
Dear all,
I am in agreement with Andrew and Patrick. The system propopse by Andrew is the one we have implemented on EIAD, where most texts have been tagged at the <w> level.
See e.g. the first stanza from EIAD 186:
<lg met="mālinī" n="I"> <l n="a"> <lb n="1"/> <g type="dextrorotatory-spiral"/> <w>jayati</w> <w>munir</w> <w part="I">udagrakhyātacandrāṁśujāla</w> </l> <l n="b"> <w part="F">pracayarucirakī<lb n="2" break="no"/>rttiśrīr</w> <w>ajeyasya</w> <w>yasya</w> </l> <l n="c"> <w>jagad</w> <w>idam</w> <w>abhiṣiktan</w> <w>dakṣiṇāṁmbhobhir</w> <w part="I">u<lb n="3" break="no"/>ccaiḥ</w> </l> <l n="d"> <w part="F">kṣubhitasalilanāthasparddhibhir</w> <w>mmārasainyaiḥ</w> <pc>||</pc> </l> </lg>
See <http://hisoma.huma-num.fr/exist/apps/EIAD/works/EIAD0186.xml?&odd=teipublisher.odd <http://hisoma.huma-num.fr/exist/apps/EIAD/works/EIAD0186.xml?&odd=teipublisher.odd>>. In our system, the layout is indeed based upon a test for the value of @met.
Best wishes,
Arlo
Le 7 juin 2018 à 23:04, Patrick McAllister <patrick.mcallister@oeaw.ac.at <mailto:patrick.mcallister@oeaw.ac.at>> a écrit :
Dear list members,
I've found the alternative solution proposed by Andrew to be serve most of my purposes (one important one being ease of use):
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg>
All else being equal (problems with word boundaries across markup etc.), I find it easier to think of all pādas as being on the same level and of the same type: that’s why I prefer tei:l elements for them. Introducing tei:seg elements or additional (nested) tei:lg seems to complicate things, and I’m not sure which problems they are supposed to solve. If I know that a verse is supposed to consist of, say, 4 pādas (by looking at the @met attribute), I’d expect to just have those 4 items on the same level, and immediately under the tei:lg.
All typesetting problems are easily solved by looking at the @met attribute, and perhaps also considering the number of tei:l elements immediately under a tei:lg. In SARIT, as Andrew has pointed out, the encoding is usually 2 pādas per tei:l (but not always if I remember correctly).
It all depends, of course, on the purpose of the text being encoded (I haven’t done any automated metrical analysis, for example, but imagine I’d strip out most of the markup first in any case and wouldn’t want more sophisticated markup).
Best wishes,
On Thu, Jun 07 2018, Peter Scharf wrote:
I use the seg-element to mark pAdas
<lg type="upendravajrA/indravajrA/indravajrA/upendravajrA" ana="upajAti" met="jtjgg/ttjgg/ttjgg/jtjgg" n="93"> <l> <seg type='foot'>anantarodIritalakzmaBAjO</seg> <seg type='foot'>pAdO yadIyAv-upajAtayas-tAH</seg> </l> <l> <seg type='foot'>itTaM kilAnyAsv-api miSritAsu</seg> <seg type='foot'>vadanti2 jAtizv-idam-eva nAma ..93..</seg> </l> </lg>
I use the space element if there is a word break between pAdas in anuzwuB meter (where there often is not), though I see no harm in using the caesura element there. There are however often caesurae even where there is no pAda break, so if consistency is the issue, the seg element seems to me to be the best solution.
Yours, Peter
****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org <mailto:scharf@sanskritlibrary.org> http://sanskritlibrary.org ******************************
On 7 Jun. 2018, at 9:26 AM, Balogh Dániel <danbalogh@gmail.com> wrote:
Dear Andrew and everyone else,
the solution I've come up with for Siddham is to mark up pādas as <l>, and to nest <lg> elements for half-verses, thus for all varṇavṛtta metres: <lg n="1" met="pṛthvī"> <lg type="halfverse" n="ab"> <l>pradāna-bhuja-vikkrama-praśama-śāstra-vākyodayair</l> <l>uparyyupari-sañcayocchritam aneka-mārggaṃ yaśaḥ</l> </lg> <lg type="halfverse" n="cd"> <l>punāti bhuvana-trayaṃ paśupater jjaṭāntar-guhā-</l> <l>nirodha-parimokṣa-śīghram iva pāṇḍu gāṅgaṃ payaḥ</l> </lg> </lg>
I feel this method is fully compliant with TEI and it paves the way for typography, giving you the choice to print or not to print a line break at the end of odd pādas, and to add automated | and || punctuation if desired. The shortcomings that I am aware of are 1) this works best with transliteration (e.g. pāda a would have to end at yai, and ru would have to move to pāda b in an alphasyllabary); and 2) it may interfere with lexical tagging, e.g. if you wanted to wrap all words including compounds in <w>, then the compound spanning b to c is problematic. As I see things, problem 1 is universal, not restricted to this scheme; those who encode texts in Devanagari or another Indic script just have to do some things differently, and automated conversion between scripts remainst tricky with markup. As for problem 2, it can still be handled in TEI if necessary; I am not tagging words in my corpus so I have not looked into linking elements together. In a language where lexical units stretch across pāda boundaries a lot of the time, it may be inconvenient to keep doing this, but it still looks like best practice to me.
Now in metres of the āryā family I mark up only two <l> elements, and each of those is alwo wrapped in <lg>, like this:
<lg n="42" met="āryā"> <lg type="halfverse" n="ab"> <l>śaśineva nabho vimalaṃ kaustubha-maṇineva śārṇgiṇo vakṣaḥ|</l> </lg> <lg type="halfverse" n="cd"> <l>bhavana-vareṇa tathedaṃ puram akhilam alaṃkṛtam udāraṃ||</l> </lg> </lg>
This leaves the caesura out, which is my choice. I have likewise chosen not to tag the caesura in varṇavṛtta metres, and I feel that the caesura in āryā is more akin to the caesura within a pāda of a catuṣpadī than to the yati at the end of an odd pāda of a catuṣpadī. This is subjective and one could argue differently. If I did want to mark up caesurae then I would use the <caesura> element in both āryā and within pādas of varṇavṛttas for that purpose. This seems to be much easier to work with than using <seg> elements for every colon.
All best, Dan
On 2018. 06. 07. 15:17, Andrew Ollett wrote:
Dear colleagues,
I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/> ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista<caesura/> heman kitābapa niragraha māsku liṅnya</l> </lg>
This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg>
My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space.
But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely.
Andrew
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org <mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts>
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
-- Patrick McAllister
Email: patrick.mcallister@oeaw.ac.at <mailto:patrick.mcallister@oeaw.ac.at> Phone: + 43 1 51581 6423
Institute for the Cultural and Intellectual History of Asia (IKGA) Austrian Academy of Sciences Hollandstraße 11-13, 2nd floor 1020 Vienna, Austria
http://www.ikga.oeaw.ac.at/ <http://www.ikga.oeaw.ac.at/> _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts

Dear Peter (and colleagues), I notice that, between the people who have weighed in on this matter, we have exhausted every possible alternative for this particular markup question that the TEI makes available. Just one question: by "lines" you surely mean typographic lines, right? Otherwise I am not sure that, if we thought of a Sanskrit equivalent for a metrical line, it would surely be a pāda? (I ask because I take this for granted but the meaning of "line" is precisely what is at issue here.) With regard to the first major problem that we face in our world, and which other traditions encounter rarely if at all, namely the lack of coincidence between metrical and word boundaries, I suppose we shouldn't worry about it. After all, we've basically thrown word boundaries to the wind with sandhi, and if we want to do serious processing that involves word boundaries, it's probably better to create an alternative encoding without metrical structure (as Peter suggests), or not to use TEI at all. In any case, all of the solutions offered here are equally good (or bad) from this point of view. With regard to the second such problem, the lack of coincidence between typographic and metrical lines, I am really not sure. I do not see it as practical to encode anuṣṭubh and āryā verses with four obligatory <l> or <seg> elements. But otherwise there will be a major difference between shorter and longer verse-forms in terms of their encoding, which we don't want. I realize that <caesura/> has problems, but I think an optional milestone element is really what we want in this context. Maybe just <milestone type="pada-boundary"/> would work. With regard to words breaking across pāda boundaries, we could either process it implicitly, based on the adjoining text, as in Charles' example, or explicitly, with @break (as we do with <lb/> and <pb/> elements). Andrew 2018-06-08 17:41 GMT+02:00 Peter Scharf <scharf@sanskritlibrary.org>:
If one tags words within a verse one will have to disolve sandh which destroys the meter and makes the text no longer metrical, i.e. no longer a verse. For this reason, I do word tagging in a separate file from the file that has sandhi of the verse undisturbed and coordinate it with the verse file by identical xml:id elements. Yours, Peter
****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org http://sanskritlibrary.org ******************************
On 7 Jun. 2018, at 11:38 PM, Arlo Griffiths <arlo.griffiths@efeo.net> wrote:
Dear all,
I am in agreement with Andrew and Patrick. The system propopse by Andrew is the one we have implemented on EIAD, where most texts have been tagged at the <w> level.
See e.g. the first stanza from EIAD 186:
<lg met="mālinī" n="I"> <l n="a"> <lb n="1"/> <g type="dextrorotatory-spiral"/> <w>jayati</w> <w>munir</w> <w part="I">udagrakhyātacandrāṁśujāla</w> </l> <l n="b"> <w part="F">pracayarucirakī<lb n="2" break="no"/>rttiśrīr</w> <w>ajeyasya</w> <w>yasya</w> </l> <l n="c"> <w>jagad</w> <w>idam</w> <w>abhiṣiktan</w> <w>dakṣiṇāṁmbhobhir</w> <w part="I">u<lb n="3" break="no"/>ccaiḥ</w> </l> <l n="d"> <w part="F">kṣubhitasalilanāthasparddhibhir</w> <w>mmārasainyaiḥ</w> <pc>||</pc> </l> </lg>
See <http://hisoma.huma-num.fr/exist/apps/EIAD/works/EIAD0186. xml?&odd=teipublisher.odd>. In our system, the layout is indeed based upon a test for the value of @met.
Best wishes,
Arlo
Le 7 juin 2018 à 23:04, Patrick McAllister <patrick.mcallister@oeaw.ac.at> a écrit :
Dear list members,
I've found the alternative solution proposed by Andrew to be serve most of my purposes (one important one being ease of use):
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg>
All else being equal (problems with word boundaries across markup etc.), I find it easier to think of all pādas as being on the same level and of the same type: that’s why I prefer tei:l elements for them. Introducing tei:seg elements or additional (nested) tei:lg seems to complicate things, and I’m not sure which problems they are supposed to solve. If I know that a verse is supposed to consist of, say, 4 pādas (by looking at the @met attribute), I’d expect to just have those 4 items on the same level, and immediately under the tei:lg.
All typesetting problems are easily solved by looking at the @met attribute, and perhaps also considering the number of tei:l elements immediately under a tei:lg. In SARIT, as Andrew has pointed out, the encoding is usually 2 pādas per tei:l (but not always if I remember correctly).
It all depends, of course, on the purpose of the text being encoded (I haven’t done any automated metrical analysis, for example, but imagine I’d strip out most of the markup first in any case and wouldn’t want more sophisticated markup).
Best wishes,
On Thu, Jun 07 2018, Peter Scharf wrote:
I use the seg-element to mark pAdas
<lg type="upendravajrA/indravajrA/indravajrA/upendravajrA" ana="upajAti" met="jtjgg/ttjgg/ttjgg/jtjgg" n="93"> <l> <seg type='foot'>anantarodIritalakzmaBAjO</seg> <seg type='foot'>pAdO yadIyAv-upajAtayas-tAH</seg> </l> <l> <seg type='foot'>itTaM kilAnyAsv-api miSritAsu</seg> <seg type='foot'>vadanti2 jAtizv-idam-eva nAma ..93..</seg> </l> </lg>
I use the space element if there is a word break between pAdas in anuzwuB meter (where there often is not), though I see no harm in using the caesura element there. There are however often caesurae even where there is no pAda break, so if consistency is the issue, the seg element seems to me to be the best solution.
Yours, Peter
****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org http://sanskritlibrary.org ******************************
On 7 Jun. 2018, at 9:26 AM, Balogh Dániel <danbalogh@gmail.com> wrote:
Dear Andrew and everyone else,
the solution I've come up with for Siddham is to mark up pādas as <l>, and to nest <lg> elements for half-verses, thus for all varṇavṛtta metres: <lg n="1" met="pṛthvī"> <lg type="halfverse" n="ab"> <l>pradāna-bhuja-vikkrama-praśama-śāstra-vākyodayair</l> <l>uparyyupari-sañcayocchritam aneka-mārggaṃ yaśaḥ</l> </lg> <lg type="halfverse" n="cd"> <l>punāti bhuvana-trayaṃ paśupater jjaṭāntar-guhā-</l> <l>nirodha-parimokṣa-śīghram iva pāṇḍu gāṅgaṃ payaḥ</l> </lg> </lg>
I feel this method is fully compliant with TEI and it paves the way for typography, giving you the choice to print or not to print a line break at the end of odd pādas, and to add automated | and || punctuation if desired. The shortcomings that I am aware of are 1) this works best with transliteration (e.g. pāda a would have to end at yai, and ru would have to move to pāda b in an alphasyllabary); and 2) it may interfere with lexical tagging, e.g. if you wanted to wrap all words including compounds in <w>, then the compound spanning b to c is problematic. As I see things, problem 1 is universal, not restricted to this scheme; those who encode texts in Devanagari or another Indic script just have to do some things differently, and automated conversion between scripts remainst tricky with markup. As for problem 2, it can still be handled in TEI if necessary; I am not tagging words in my corpus so I have not looked into linking elements together. In a language where lexical units stretch across pāda boundaries a lot of the time, it may be inconvenient to keep doing this, but it still looks like best practice to me.
Now in metres of the āryā family I mark up only two <l> elements, and each of those is alwo wrapped in <lg>, like this:
<lg n="42" met="āryā"> <lg type="halfverse" n="ab"> <l>śaśineva nabho vimalaṃ kaustubha-maṇineva śārṇgiṇo vakṣaḥ|</l> </lg> <lg type="halfverse" n="cd"> <l>bhavana-vareṇa tathedaṃ puram akhilam alaṃkṛtam udāraṃ||</l> </lg> </lg>
This leaves the caesura out, which is my choice. I have likewise chosen not to tag the caesura in varṇavṛtta metres, and I feel that the caesura in āryā is more akin to the caesura within a pāda of a catuṣpadī than to the yati at the end of an odd pāda of a catuṣpadī. This is subjective and one could argue differently. If I did want to mark up caesurae then I would use the <caesura> element in both āryā and within pādas of varṇavṛttas for that purpose. This seems to be much easier to work with than using <seg> elements for every colon.
All best, Dan
On 2018. 06. 07. 15:17, Andrew Ollett wrote:
Dear colleagues,
I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/> ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista<caesura/> heman kitābapa niragraha māsku liṅnya</l> </lg>
This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg>
My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space.
But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely.
Andrew
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org <mailto:indic-texts@lists.tei-c.org <indic-texts@lists.tei-c.org>> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts < http://lists.lists.tei-c.org/mailman/listinfo/indic-texts>
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
-- Patrick McAllister
Email: patrick.mcallister@oeaw.ac.at Phone: + 43 1 51581 6423
Institute for the Cultural and Intellectual History of Asia (IKGA) Austrian Academy of Sciences Hollandstraße 11-13, 2nd floor 1020 Vienna, Austria
http://www.ikga.oeaw.ac.at/ _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts

Thanks Dan. I certainly agree that I have always understood that the TEI element <l> maps precisely onto the sanskritist’s pāda, and hence that I was surprised by some of Peter’s reasoning. It would seem preferable to me to stick to this principle and adjust the rest of our encoding principles accordingly. For an example of pāda boundary inside a word (i.e. a compound), see three cases in stanza I and VI here: <http://hisoma.huma-num.fr/exist/apps/EIAD/works/EIAD0186.xml?&odd=teipublisher.odd>. This is quite a common phenomenon, I believe, so you may have been asking about something else... To change the topic slightly: I was a bit shocked by Andrew’s words « we've basically thrown word boundaries to the wind with sandhi, and if we want to do serious processing that involves word boundaries, it's probably better to create an alternative encoding without metrical structure (as Peter suggests), or not to use TEI at all. » I had thought that one of the main reasons for the creation of this SIG was precisely to try to come up with a smart collective decision on how to encode texts down to the level of <w> (by which, in the Sanskrit context, I mean element of compound), to allow indexing on this basis while still being able to publish text displayed with proper sandhi. In my experience in the TEI world so far, making texts searchable once they have been encoded is a major headache, and I have never yet been offerred a fully satisfactory solution — but none of the solutions that have ever been offered to me by people more competent than myself have ever involved anything else than tagging at <w> level or using space as word delimiter (which works well for many languages but is ineffective for Sanskrit). Nevertheless, I am not yet ready to give up on this ambition. Best, Arlo Le 16 juin 2018 à 15:28, Balogh Dániel <danbalogh@gmail.com<mailto:danbalogh@gmail.com>> a écrit : To me it is intuitively obvious (i.e. not necessarily correct, nor necessarily obvious to others) that what matters is the metrical line which is equivalent to a pāda. Typographic lines are a typesetting matter and can be left to styling. Varṇavṛtta stanzas are conceived as catuṣpadī, after all. They divide into four quarters, admittedly with pairs of quarters bound more closely together than the first pair to the second. But they are still pairs of quarters. What I'm trying to say is that the ardhaśloka is a less straightforward or natural unit than the pāda. As for word boundaries or sandhi transgressing pāda boundaries, can anyone give actual examples of that happening? In, like, more than once in ten thousand cases? I believe I have seen a handful of instances in my life of pādas (ab or cd) joined in vowel sandhi in short and common metres like anuṣṭubh or upajāti (and maybe vasantatilaka). As far as I recall, no more than that, not in any complex metre ever, and no pāda boundary inside a morpheme, ever. Dan On 2018. 06. 16. 9:20, Andrew Ollett wrote: Dear Peter (and colleagues), I notice that, between the people who have weighed in on this matter, we have exhausted every possible alternative for this particular markup question that the TEI makes available. Just one question: by "lines" you surely mean typographic lines, right? Otherwise I am not sure that, if we thought of a Sanskrit equivalent for a metrical line, it would surely be a pāda? (I ask because I take this for granted but the meaning of "line" is precisely what is at issue here.) With regard to the first major problem that we face in our world, and which other traditions encounter rarely if at all, namely the lack of coincidence between metrical and word boundaries, I suppose we shouldn't worry about it. After all, we've basically thrown word boundaries to the wind with sandhi, and if we want to do serious processing that involves word boundaries, it's probably better to create an alternative encoding without metrical structure (as Peter suggests), or not to use TEI at all. In any case, all of the solutions offered here are equally good (or bad) from this point of view. With regard to the second such problem, the lack of coincidence between typographic and metrical lines, I am really not sure. I do not see it as practical to encode anuṣṭubh and āryā verses with four obligatory <l> or <seg> elements. But otherwise there will be a major difference between shorter and longer verse-forms in terms of their encoding, which we don't want. I realize that <caesura/> has problems, but I think an optional milestone element is really what we want in this context. Maybe just <milestone type="pada-boundary"/> would work. With regard to words breaking across pāda boundaries, we could either process it implicitly, based on the adjoining text, as in Charles' example, or explicitly, with @break (as we do with <lb/> and <pb/> elements). Andrew 2018-06-08 17:41 GMT+02:00 Peter Scharf <scharf@sanskritlibrary.org<mailto:scharf@sanskritlibrary.org>>: If one tags words within a verse one will have to disolve sandh which destroys the meter and makes the text no longer metrical, i.e. no longer a verse. For this reason, I do word tagging in a separate file from the file that has sandhi of the verse undisturbed and coordinate it with the verse file by identical xml:id elements. Yours, Peter ****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org<mailto:scharf@sanskritlibrary.org> http://sanskritlibrary.org<http://sanskritlibrary.org/> ****************************** On 7 Jun. 2018, at 11:38 PM, Arlo Griffiths <arlo.griffiths@efeo.net<mailto:arlo.griffiths@efeo.net>> wrote: Dear all, I am in agreement with Andrew and Patrick. The system propopse by Andrew is the one we have implemented on EIAD, where most texts have been tagged at the <w> level. See e.g. the first stanza from EIAD 186: <lg met="mālinī" n="I"> <l n="a"> <lb n="1"/> <g type="dextrorotatory-spiral"/> <w>jayati</w> <w>munir</w> <w part="I">udagrakhyātacandrāṁśujāla</w> </l> <l n="b"> <w part="F">pracayarucirakī<lb n="2" break="no"/>rttiśrīr</w> <w>ajeyasya</w> <w>yasya</w> </l> <l n="c"> <w>jagad</w> <w>idam</w> <w>abhiṣiktan</w> <w>dakṣiṇāṁmbhobhir</w> <w part="I">u<lb n="3" break="no"/>ccaiḥ</w> </l> <l n="d"> <w part="F">kṣubhitasalilanāthasparddhibhir</w> <w>mmārasainyaiḥ</w> <pc>||</pc> </l> </lg> See <http://hisoma.huma-num.fr/exist/apps/EIAD/works/EIAD0186.xml?&odd=teipublisher.odd>. In our system, the layout is indeed based upon a test for the value of @met. Best wishes, Arlo Le 7 juin 2018 à 23:04, Patrick McAllister <patrick.mcallister@oeaw.ac.at<mailto:patrick.mcallister@oeaw.ac.at>> a écrit : Dear list members, I've found the alternative solution proposed by Andrew to be serve most of my purposes (one important one being ease of use): <lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg> All else being equal (problems with word boundaries across markup etc.), I find it easier to think of all pādas as being on the same level and of the same type: that’s why I prefer tei:l elements for them. Introducing tei:seg elements or additional (nested) tei:lg seems to complicate things, and I’m not sure which problems they are supposed to solve. If I know that a verse is supposed to consist of, say, 4 pādas (by looking at the @met attribute), I’d expect to just have those 4 items on the same level, and immediately under the tei:lg. All typesetting problems are easily solved by looking at the @met attribute, and perhaps also considering the number of tei:l elements immediately under a tei:lg. In SARIT, as Andrew has pointed out, the encoding is usually 2 pādas per tei:l (but not always if I remember correctly). It all depends, of course, on the purpose of the text being encoded (I haven’t done any automated metrical analysis, for example, but imagine I’d strip out most of the markup first in any case and wouldn’t want more sophisticated markup). Best wishes, On Thu, Jun 07 2018, Peter Scharf wrote: I use the seg-element to mark pAdas <lg type="upendravajrA/indravajrA/indravajrA/upendravajrA" ana="upajAti" met="jtjgg/ttjgg/ttjgg/jtjgg" n="93"> <l> <seg type='foot'>anantarodIritalakzmaBAjO</seg> <seg type='foot'>pAdO yadIyAv-upajAtayas-tAH</seg> </l> <l> <seg type='foot'>itTaM kilAnyAsv-api miSritAsu</seg> <seg type='foot'>vadanti2 jAtizv-idam-eva nAma ..93..</seg> </l> </lg> I use the space element if there is a word break between pAdas in anuzwuB meter (where there often is not), though I see no harm in using the caesura element there. There are however often caesurae even where there is no pAda break, so if consistency is the issue, the seg element seems to me to be the best solution. Yours, Peter ****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org<mailto:scharf@sanskritlibrary.org> http://sanskritlibrary.org<http://sanskritlibrary.org/> ****************************** On 7 Jun. 2018, at 9:26 AM, Balogh Dániel <danbalogh@gmail.com<mailto:danbalogh@gmail.com>> wrote: Dear Andrew and everyone else, the solution I've come up with for Siddham is to mark up pādas as <l>, and to nest <lg> elements for half-verses, thus for all varṇavṛtta metres: <lg n="1" met="pṛthvī"> <lg type="halfverse" n="ab"> <l>pradāna-bhuja-vikkrama-praśama-śāstra-vākyodayair</l> <l>uparyyupari-sañcayocchritam aneka-mārggaṃ yaśaḥ</l> </lg> <lg type="halfverse" n="cd"> <l>punāti bhuvana-trayaṃ paśupater jjaṭāntar-guhā-</l> <l>nirodha-parimokṣa-śīghram iva pāṇḍu gāṅgaṃ payaḥ</l> </lg> </lg> I feel this method is fully compliant with TEI and it paves the way for typography, giving you the choice to print or not to print a line break at the end of odd pādas, and to add automated | and || punctuation if desired. The shortcomings that I am aware of are 1) this works best with transliteration (e.g. pāda a would have to end at yai, and ru would have to move to pāda b in an alphasyllabary); and 2) it may interfere with lexical tagging, e.g. if you wanted to wrap all words including compounds in <w>, then the compound spanning b to c is problematic. As I see things, problem 1 is universal, not restricted to this scheme; those who encode texts in Devanagari or another Indic script just have to do some things differently, and automated conversion between scripts remainst tricky with markup. As for problem 2, it can still be handled in TEI if necessary; I am not tagging words in my corpus so I have not looked into linking elements together. In a language where lexical units stretch across pāda boundaries a lot of the time, it may be inconvenient to keep doing this, but it still looks like best practice to me. Now in metres of the āryā family I mark up only two <l> elements, and each of those is alwo wrapped in <lg>, like this: <lg n="42" met="āryā"> <lg type="halfverse" n="ab"> <l>śaśineva nabho vimalaṃ kaustubha-maṇineva śārṇgiṇo vakṣaḥ|</l> </lg> <lg type="halfverse" n="cd"> <l>bhavana-vareṇa tathedaṃ puram akhilam alaṃkṛtam udāraṃ||</l> </lg> </lg> This leaves the caesura out, which is my choice. I have likewise chosen not to tag the caesura in varṇavṛtta metres, and I feel that the caesura in āryā is more akin to the caesura within a pāda of a catuṣpadī than to the yati at the end of an odd pāda of a catuṣpadī. This is subjective and one could argue differently. If I did want to mark up caesurae then I would use the <caesura> element in both āryā and within pādas of varṇavṛttas for that purpose. This seems to be much easier to work with than using <seg> elements for every colon. All best, Dan On 2018. 06. 07. 15:17, Andrew Ollett wrote: Dear colleagues, I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse: <lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/> ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista<caesura/> heman kitābapa niragraha māsku liṅnya</l> </lg> This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows: <lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg> My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space. But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely. Andrew _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org<mailto:indic-texts@lists.tei-c.org> <mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts> _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org<mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org<mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts -- Patrick McAllister Email: patrick.mcallister@oeaw.ac.at<mailto:patrick.mcallister@oeaw.ac.at> Phone: + 43 1 51581 6423 Institute for the Cultural and Intellectual History of Asia (IKGA) Austrian Academy of Sciences Hollandstraße 11-13, 2nd floor 1020 Vienna, Austria http://www.ikga.oeaw.ac.at/ _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org<mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org<mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org<mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org<mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org<mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts

If one tags words within a verse one will have to disolve sandh which destroys the meter and makes the text no longer metrical, i.e. no longer a verse. For this reason, I do word tagging in a separate file from the file that has sandhi of the verse undisturbed and coordinate it with the verse file by identical xml:id elements. Yours, Peter ****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org <mailto:scharf@sanskritlibrary.org> http://sanskritlibrary.org <http://sanskritlibrary.org/> ******************************

The idea of markup is to capture information explicitly. To capture explicitly the information that two pAdas make a line and that two (or sometimes one or three or more) make a verse, one needs different elements for pAdas and for lines. That is why I use the seg element, which conveniently does have a type attribute to specify what it is. caesura or space elements also are useful for marking the location of breaks, but don't explicitly mark pAdas. The lg/l/seg markup is as easy to process as the other alternatives proposed. We send lg elements with l and seg children to our meter analyzer without a hitch using regular expressions to extract the text. Rendering is also not a problem. Putting four lines in l elements leaves the problem that Anustubh meters usually put two pAdas per line with sandhi while longer meters may not. An l-element was conceived for a line, not for a pAda. Yours, Peter ****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org http://sanskritlibrary.org ******************************
On 7 Jun. 2018, at 4:04 PM, Patrick McAllister <patrick.mcallister@oeaw.ac.at> wrote:
Dear list members,
I've found the alternative solution proposed by Andrew to be serve most of my purposes (one important one being ease of use):
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg>
All else being equal (problems with word boundaries across markup etc.), I find it easier to think of all pādas as being on the same level and of the same type: that’s why I prefer tei:l elements for them. Introducing tei:seg elements or additional (nested) tei:lg seems to complicate things, and I’m not sure which problems they are supposed to solve. If I know that a verse is supposed to consist of, say, 4 pādas (by looking at the @met attribute), I’d expect to just have those 4 items on the same level, and immediately under the tei:lg.
All typesetting problems are easily solved by looking at the @met attribute, and perhaps also considering the number of tei:l elements immediately under a tei:lg. In SARIT, as Andrew has pointed out, the encoding is usually 2 pādas per tei:l (but not always if I remember correctly).
It all depends, of course, on the purpose of the text being encoded (I haven’t done any automated metrical analysis, for example, but imagine I’d strip out most of the markup first in any case and wouldn’t want more sophisticated markup).
Best wishes,
On Thu, Jun 07 2018, Peter Scharf wrote:
I use the seg-element to mark pAdas
<lg type="upendravajrA/indravajrA/indravajrA/upendravajrA" ana="upajAti" met="jtjgg/ttjgg/ttjgg/jtjgg" n="93"> <l> <seg type='foot'>anantarodIritalakzmaBAjO</seg> <seg type='foot'>pAdO yadIyAv-upajAtayas-tAH</seg> </l> <l> <seg type='foot'>itTaM kilAnyAsv-api miSritAsu</seg> <seg type='foot'>vadanti2 jAtizv-idam-eva nAma ..93..</seg> </l> </lg>
I use the space element if there is a word break between pAdas in anuzwuB meter (where there often is not), though I see no harm in using the caesura element there. There are however often caesurae even where there is no pAda break, so if consistency is the issue, the seg element seems to me to be the best solution.
Yours, Peter
****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org http://sanskritlibrary.org ******************************
On 7 Jun. 2018, at 9:26 AM, Balogh Dániel <danbalogh@gmail.com> wrote:
Dear Andrew and everyone else,
the solution I've come up with for Siddham is to mark up pādas as <l>, and to nest <lg> elements for half-verses, thus for all varṇavṛtta metres: <lg n="1" met="pṛthvī"> <lg type="halfverse" n="ab"> <l>pradāna-bhuja-vikkrama-praśama-śāstra-vākyodayair</l> <l>uparyyupari-sañcayocchritam aneka-mārggaṃ yaśaḥ</l> </lg> <lg type="halfverse" n="cd"> <l>punāti bhuvana-trayaṃ paśupater jjaṭāntar-guhā-</l> <l>nirodha-parimokṣa-śīghram iva pāṇḍu gāṅgaṃ payaḥ</l> </lg> </lg>
I feel this method is fully compliant with TEI and it paves the way for typography, giving you the choice to print or not to print a line break at the end of odd pādas, and to add automated | and || punctuation if desired. The shortcomings that I am aware of are 1) this works best with transliteration (e.g. pāda a would have to end at yai, and ru would have to move to pāda b in an alphasyllabary); and 2) it may interfere with lexical tagging, e.g. if you wanted to wrap all words including compounds in <w>, then the compound spanning b to c is problematic. As I see things, problem 1 is universal, not restricted to this scheme; those who encode texts in Devanagari or another Indic script just have to do some things differently, and automated conversion between scripts remainst tricky with markup. As for problem 2, it can still be handled in TEI if necessary; I am not tagging words in my corpus so I have not looked into linking elements together. In a language where lexical units stretch across pāda boundaries a lot of the time, it may be inconvenient to keep doing this, but it still looks like best practice to me.
Now in metres of the āryā family I mark up only two <l> elements, and each of those is alwo wrapped in <lg>, like this:
<lg n="42" met="āryā"> <lg type="halfverse" n="ab"> <l>śaśineva nabho vimalaṃ kaustubha-maṇineva śārṇgiṇo vakṣaḥ|</l> </lg> <lg type="halfverse" n="cd"> <l>bhavana-vareṇa tathedaṃ puram akhilam alaṃkṛtam udāraṃ||</l> </lg> </lg>
This leaves the caesura out, which is my choice. I have likewise chosen not to tag the caesura in varṇavṛtta metres, and I feel that the caesura in āryā is more akin to the caesura within a pāda of a catuṣpadī than to the yati at the end of an odd pāda of a catuṣpadī. This is subjective and one could argue differently. If I did want to mark up caesurae then I would use the <caesura> element in both āryā and within pādas of varṇavṛttas for that purpose. This seems to be much easier to work with than using <seg> elements for every colon.
All best, Dan
On 2018. 06. 07. 15:17, Andrew Ollett wrote:
Dear colleagues,
I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/> ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista<caesura/> heman kitābapa niragraha māsku liṅnya</l> </lg>
This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows:
<lg n="3" met="vasantatilaka"> <l>vvantən mañumbana puḍak ginuritnya pārtha</l> <l>ndān susvasusvani kinolnya hanan liniṅliṅ</l> <l>rakryan vədinta tan akun ləvu paṅhavista</l> <l>heman kitābapa niragraha māsku liṅnya</l> </lg>
My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space.
But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely.
Andrew
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org <mailto:indic-texts@lists.tei-c.org> <mailto:indic-texts@lists.tei-c.org <mailto:indic-texts@lists.tei-c.org>> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts> <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts>>
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org <mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts>
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org <mailto:indic-texts@lists.tei-c.org> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts <http://lists.lists.tei-c.org/mailman/listinfo/indic-texts>
-- Patrick McAllister
Email: patrick.mcallister@oeaw.ac.at <mailto:patrick.mcallister@oeaw.ac.at> Phone: + 43 1 51581 6423
Institute for the Cultural and Intellectual History of Asia (IKGA) Austrian Academy of Sciences Hollandstraße 11-13, 2nd floor 1020 Vienna, Austria

The idea of markup is to capture information explicitly. To capture explicitly the information that two pAdas make a line and that two (or sometimes one or three or more) make a verse, one needs different elements for pAdas and for lines. That is why I use the seg element, which conveniently does have a type attribute to specify what it is. caesura or space elements also are useful for marking the location of breaks, but don't explicitly mark pAdas. The lg/l/seg markup is as easy to process as the other alternatives proposed. We send lg elements with l and seg children to our meter analyzer without a hitch using regular expressions to extract the text. Rendering is also not a problem. Putting four lines in l elements leaves the problem that Anustubh meters usually put two pAdas per line with sandhi while longer meters may not. An l-element was conceived for a line, not for a pAda. Yours, Peter ****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org <mailto:scharf@sanskritlibrary.org> http://sanskritlibrary.org <http://sanskritlibrary.org/> ******************************
participants (5)
-
Andrew Ollett
-
Arlo Griffiths
-
Balogh Dániel
-
Patrick McAllister
-
Peter Scharf