Dear all,
I am in agreement with Andrew and Patrick. The system propopse by Andrew is the one we have implemented on EIAD, where most texts have been tagged at the <w> level.
See e.g. the first stanza from EIAD 186:
<lg met="mālinī" n="I">
<l n="a">
<lb n="1"/>
<g type="dextrorotatory-spiral"/>
<w>jayati</w>
<w>munir</w>
<w part="I">udagrakhyātacandrāṁśujāla</w>
</l>
<l n="b">
<w part="F">pracayarucirakī<lb n="2" break="no"/>rttiśrīr</w>
<w>ajeyasya</w>
<w>yasya</w>
</l>
<l n="c">
<w>jagad</w>
<w>idam</w>
<w>abhiṣiktan</w>
<w>dakṣiṇāṁmbhobhir</w>
<w part="I">u<lb n="3" break="no"/>ccaiḥ</w>
</l>
<l n="d">
<w part="F">kṣubhitasalilanāthasparddhibhir</w>
<w>mmārasainyaiḥ</w>
<pc>||</pc>
</l>
</lg>
See http://hisoma.huma-num.fr/exist/apps/EIAD/works/EIAD0186.xml?&odd=teipublisher.odd. In our system, the layout is indeed based upon a test for the value of @met.
Best wishes,
Arlo
Le 7 juin 2018 à 23:04, Patrick McAllister mailto:patrick.mcallister@oeaw.ac.at> a écrit :
Dear list members,
I've found the alternative solution proposed by Andrew to be serve most
of my purposes (one important one being ease of use):
<lg n="3" met="vasantatilaka">
<l>vvantən mañumbana puḍak ginuritnya pārtha</l>
<l>ndān susvasusvani kinolnya hanan liniṅliṅ</l>
<l>rakryan vədinta tan akun ləvu paṅhavista</l>
<l>heman kitābapa niragraha māsku liṅnya</l>
</lg>
All else being equal (problems with word boundaries across markup etc.),
I find it easier to think of all pādas as being on the same level and of
the same type: that’s why I prefer tei:l elements for them. Introducing
tei:seg elements or additional (nested) tei:lg seems to complicate
things, and I’m not sure which problems they are supposed to solve. If
I know that a verse is supposed to consist of, say, 4 pādas (by looking
at the @met attribute), I’d expect to just have those 4 items on the
same level, and immediately under the tei:lg.
All typesetting problems are easily solved by looking at the @met
attribute, and perhaps also considering the number of tei:l elements
immediately under a tei:lg. In SARIT, as Andrew has pointed out, the
encoding is usually 2 pādas per tei:l (but not always if I remember
correctly).
It all depends, of course, on the purpose of the text being encoded (I
haven’t done any automated metrical analysis, for example, but imagine
I’d strip out most of the markup first in any case and wouldn’t want
more sophisticated markup).
Best wishes,
On Thu, Jun 07 2018, Peter Scharf wrote:
I use the seg-element to mark pAdas
<lg type="upendravajrA/indravajrA/indravajrA/upendravajrA" ana="upajAti" met="jtjgg/ttjgg/ttjgg/jtjgg" n="93">
<l>
<seg type='foot'>anantarodIritalakzmaBAjO</seg>
<seg type='foot'>pAdO yadIyAv-upajAtayas-tAH</seg>
</l>
<l>
<seg type='foot'>itTaM kilAnyAsv-api miSritAsu</seg>
<seg type='foot'>vadanti2 jAtizv-idam-eva nAma ..93..</seg>
</l>
</lg>
I use the space element if there is a word break between pAdas in anuzwuB meter (where there often is not), though I see no harm in using the caesura element there. There are however often caesurae even where there is no pAda break, so if consistency is the issue, the seg element seems to me to be the best solution.
Yours,
Peter
******************************
Peter M. Scharf, President
The Sanskrit Library
scharf@sanskritlibrary.orgmailto:scharf@sanskritlibrary.org
http://sanskritlibrary.org
******************************
On 7 Jun. 2018, at 9:26 AM, Balogh Dániel wrote:
Dear Andrew and everyone else,
the solution I've come up with for Siddham is to mark up pādas as <l>, and to nest <lg> elements for half-verses, thus for all varṇavṛtta metres:
<lg n="1" met="pṛthvī">
<lg type="halfverse" n="ab">
<l>pradāna-bhuja-vikkrama-praśama-śāstra-vākyodayair</l>
<l>uparyyupari-sañcayocchritam aneka-mārggaṃ yaśaḥ</l>
</lg>
<lg type="halfverse" n="cd">
<l>punāti bhuvana-trayaṃ paśupater jjaṭāntar-guhā-</l>
<l>nirodha-parimokṣa-śīghram iva pāṇḍu gāṅgaṃ payaḥ</l>
</lg>
</lg>
I feel this method is fully compliant with TEI and it paves the way for typography, giving you the choice to print or not to print a line break at the end of odd pādas, and to add automated | and || punctuation if desired.
The shortcomings that I am aware of are 1) this works best with transliteration (e.g. pāda a would have to end at yai, and ru would have to move to pāda b in an alphasyllabary); and 2) it may interfere with lexical tagging, e.g. if you wanted to wrap all words including compounds in <w>, then the compound spanning b to c is problematic.
As I see things, problem 1 is universal, not restricted to this scheme; those who encode texts in Devanagari or another Indic script just have to do some things differently, and automated conversion between scripts remainst tricky with markup. As for problem 2, it can still be handled in TEI if necessary; I am not tagging words in my corpus so I have not looked into linking elements together. In a language where lexical units stretch across pāda boundaries a lot of the time, it may be inconvenient to keep doing this, but it still looks like best practice to me.
Now in metres of the āryā family I mark up only two <l> elements, and each of those is alwo wrapped in <lg>, like this:
<lg n="42" met="āryā">
<lg type="halfverse" n="ab">
<l>śaśineva nabho vimalaṃ kaustubha-maṇineva śārṇgiṇo vakṣaḥ|</l>
</lg>
<lg type="halfverse" n="cd">
<l>bhavana-vareṇa tathedaṃ puram akhilam alaṃkṛtam udāraṃ||</l>
</lg>
</lg>
This leaves the caesura out, which is my choice. I have likewise chosen not to tag the caesura in varṇavṛtta metres, and I feel that the caesura in āryā is more akin to the caesura within a pāda of a catuṣpadī than to the yati at the end of an odd pāda of a catuṣpadī. This is subjective and one could argue differently. If I did want to mark up caesurae then I would use the <caesura> element in both āryā and within pādas of varṇavṛttas for that purpose. This seems to be much easier to work with than using <seg> elements for every colon.
All best,
Dan
On 2018. 06. 07. 15:17, Andrew Ollett wrote:
Dear colleagues,
I am in the midst of a workshop in which we are attempting to encode texts in Old Javanese in TEI format, and the issue of encoding pādas has (once again) reared its head. We discussed this issue at length in the context of the SARIT project, and we came to the conclusion that <l> should be used for a pair of pādas, and the boundary between even and odd pādas should be represented by the <caesura/> element. Hence the following vasantatilaka verse:
<lg n="3" met="vasantatilaka">
<l>vvantən mañumbana puḍak ginuritnya pārtha<caesura/>
ndān susvasusvani kinolnya hanan liniṅliṅ</l>
<l>rakryan vədinta tan akun ləvu paṅhavista<caesura/>
heman kitābapa niragraha māsku liṅnya</l>
</lg>
This is somewhat contrary to what many people would expect, namely, that each pāda should correspond to a single <l> element, as follows:
<lg n="3" met="vasantatilaka">
<l>vvantən mañumbana puḍak ginuritnya pārtha</l>
<l>ndān susvasusvani kinolnya hanan liniṅliṅ</l>
<l>rakryan vədinta tan akun ləvu paṅhavista</l>
<l>heman kitābapa niragraha māsku liṅnya</l>
</lg>
My arguments for the use of <caesura/> involved (a) the practical necessity of encoding texts from printed editions, where the pādas are not separated typographically in all cases, especially in shorter verse forms, and thus (b) the requirement that <l> should mean the same thing for an anuṣṭubh verse as for (e.g.) śakvarī verse, i.e., it should not refer to a pādayuga in the first case, and a pāda in the second case; and (c) the frequent occurrence, in Sanskrit, of words that span the boundaries between odd and even pādas, and the undesirability of having structural elements like <l> overlap with the grammatical structure of the text (at least at the level of the word). The use of <caesura/> would be optional: it's not required (and often isn't marked typographically in shorter verse forms), but if it is present, the stylesheets will insert a space.
But I can now think of counterarguments for all of these points, and in some ways, it might be easier if <l> always mean a "pāda." (<caesura/> also doesn't have a @type attribute in standard TEI, so it might be more difficult than I expected to differentiate this "pāda-boundary caesura" from the pāda-internal yati.) So I am asking everyone whether there are compelling reasons you've discovered for preferring one encoding solution over another. (Or if you have other suggestions altogether, including the use of <seg> or other such elements.) I know that there are some features that vary across Indic languages, such as the coincidence of metrical and grammatical (esp. lexical) boundaries: these structures always coincide in Old Javanese, and almost never in Kannada, so I am hoping to avoid the problem of overlapping hierarchies completely.
Andrew
_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.org mailto:indic-texts@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/indic-texts http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
--
Patrick McAllister
Email: patrick.mcallister@oeaw.ac.atmailto:patrick.mcallister@oeaw.ac.at
Phone: + 43 1 51581 6423
Institute for the Cultural and Intellectual History of Asia (IKGA)
Austrian Academy of Sciences
Hollandstraße 11-13, 2nd floor
1020 Vienna, Austria
http://www.ikga.oeaw.ac.at/
_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/indic-texts