I think the simplest and most appropriate solution is to replace macro.xtext by <textNode> 1. This corresponds more closely to the published ISO scheme, which knows nothing of <g> elements 2. There is no reason to suppose that a text string describing a feature value would ever need to use a <g> element -- these are not transcriptions from source texts but created fragments of documentation 3. Even if there were such a reason, the way to do it would be to wrap the proposed string in the <string> element as content of the <f> The possibility of a textnode within <f> was agreed to only as a simplification of this more general approach. We don't need to agonise over making it work for every case. Indeed, in my view, we'd do better to consider removing it again. On 18/05/16 22:44, Syd Bauman wrote:
Whoa! I had not noticed yesterday that the Jenkins build is busted. At first glance, it looks like the problem is the updates I made to the content model of <f>. The error occurs when trying to validate this bit of P5/Test/testbasic.xml: --------- | <!-- extra tests for feature values --> | <fLib xmlns="http://www.tei-c.org/ns/1.0"> | <f name="xxx">A feature may have untyped content</f> | <f name="yyy"> | <string>or typed</string> | </f> | <f name="notgood"> | <string>multiple types</string> | <symbol value="doubleplusungood"/> | </f> | <f name="alsonotgood">mixed content | <symbol value="doubleplusungood"/> | </f> | </fLib> --------- Looks to me like the 3rd and 4th <f> elements are *supposed* to fail. But I'm under the impression that tests in testbasic.xml, unlike tests in detest.xml, are supposed to always succeed. Any thoughts on this?
But now, matters are worse. The new content model for <f> is
| <content> | <alternate minOccurs="1" maxOccurs="1"> | <macroRef key="macro.xtext"/> | <classRef key="model.featureVal"/> | </alternate> | </content>
In RELAX NG compact this would be expressed as | ( macro.xtext | model.featureVal ) which boils down to | ( ( text | g )* | model.featureVal.complex | model.featureVal.single ) which then boils down to | ( ( text | g )* | fs | vColl | vMerge | vNot | binary | default | numeric | string | symbol | vAlt | vLabel )
In RELAX NG this is fine. It represents exactly what we want, and works perfectly well. But in DTD-land, where the 'text' gets converted to #PCDATA, the parens around the "#PCDATA | g" cause an error. (Because DTDs require mixed content to be expressed with "#PCDATA" first in a single paren-group, nothing but OR bars, and an asterisk after the closing paren. See production 51 in 3.2.2 of the XML 1.0 Spec, 2nd edition.)
I could have sworn that at one point Sebastian updated ODD->DTD processing so that if a #PCDATA was noticed anywhere in the content, the content model was altered to meet DTDs (ridiculous) constraint on content models that represent mixed content.
But apparently I'm fantasizing, or he reverted that change. Our ODD processor does not magically fix this either for PureODD or for RELAX NG content in <content>.
Is anybody up to making this change to our ODD->DTD processor? I'm not at all sure I am.