Re: [tei-council] macro.anyXML question

15 Oct 2015

      Thanks for the explication Syd. What worries me though is why we're 
having this problem now. We've had a macro.anyXML in the Guidelines 
forever, or at least since P5 1.0, so why is this "conflicting ID-types" 
problem only surfacing now?

As to the absence of IDREF(S) -- the reason is simple enough: ages ago 
we decided to turn everything like that into a URI. During the build 
process there is  xslt  which checks that locally defined URIs are 
satisfied (i.e. that if you say somewhere target="#foo", there is 
somewhere something with @xml:id="foo").

I suspect that if you set checkid=false, we will no longer detect the 
presence of two things with @xml:id="foo", which seems to me a 
distinctly retrograde step.  (I speak as one who has spent hours 
tweaking xml:id values in examples in the Glines)

On 15/10/15 03:49, Syd Bauman wrote:
...
For the record, I think probably the right thing to do is to use the
-i (aka checkid=false) on calls to `jing`, and use some other
mechanism to check ID/IDREF. (As mentioned in [1].)
I think Eric van der Vlist says this well: "Basically, what's
happening here is that DTD compatibility mode emulates even the
restrictions of a DTD.". [paraphrased from [2]]
Again, paraphrasing Eric van der Vlist:
The requirement is that if an element <foo> is defined with a @bar
    attribute of type of ID, all the other definitions of a foo/@bar
    must also be of type ID. But hidden in the definition of
    macro.anyXML there can be a <foo> having an attribute @bar of type
    text.[3]
So the normal definition of foo/@bar as ID and the any.XML definition
of foo/@bar as text (or IDREF or whatever) conflict.
If I understand correctly[4], there are 3 tests that we would in
theory need to perform to duplicate the ID/IDREF tests. They are:
1) After performing normalize-space() on it,
     a. the value of each attribute of type ID has one NCName
     b. the value of each attribute of type IDREF has one token
     c. the value of each attribute of type IDREFS has 1+ token
  2) No two attributes of type ID have the same values.
  3) For each token in an IDREF or IDREFS attribute, there is a
     token in an ID attribute with the same value.
Note that testing (1a) is done by RELAX NG whether ID/IDREF checking
is on or not. So we don't have to worry about that.
Note also that, because we only have one attribute name in the entire
TEI scheme that is of type ID (namely @xml:id), testing 1a (if it
were needed) and (2) is pretty easy. See [5] for methods of testing
for (2) with ISO Schematron.
BUT the IMPORTANT bit is that (afaik) we don't use any IDREF or
IDREFS attributes in the TEI schema *at all*. (The string "IDREF"
does not occur in .../TEI/P5/Exemplars/tei_all.rnc, nor in any of
.../TEI/P5/Source/Specs/*.xml. However, it does occur nearly a dozen
times in .../TEI/P5/Source/Guidelines/*/*.xml, I haven't looked at
what the prose says yet.)
So I am reasonably confident that all we need to do to work around
this problem is disable ID/IDREF checking, and then use one of the
snippets of Schematron at [5] to test for (2).
Adding a <constraintSpec> for (2) to the Guidelines would also mean
users could turn off ID/IDREF checking in oXygen without losing any
validation for TEI files. (Although they would have to use the
Schematron, and they might want ID/IDREF checking for their non-TEI
files, of course.)
Notes
-----
[1] http://lists.lists.tei-c.org/pipermail/tei-council/2015/021830.html
[2] http://www.relaxng.org/pipermail/relaxng-user/2003-September/000029.html
[3] http://books.xmlschemata.org/relaxng/relax-CHP-11-SECT-4.html
[4] And there's a good chance I may not. Take a look at
     http://relaxng.org/compatibility.html#id to understand why.
[5] http://wiki.tei-c.org/index.php/Xmlid_uniqueness.sch
Lou Burnard writes:
...
On 13/10/15 16:20, Hugh Cayless wrote:
...
Question for Syd in particular, but anyone else who understands this please chip in. https://github.com/TEIC/TEI/commit/a55fa633ddb8a4a1a749222b6120bce90fa316cd <https://github.com/TEIC/TEI/commit/a55fa633ddb8a4a1a749222b6120bce90fa316cd> causes a Stylesheets test (test30.odd) to fail. Syd’s commit message led me to the (well, a) solution, which is to add a flag to Jing validation. But is this in fact leading to a risk of generating invalid Relax schemas? Can that be helped?
On the face of it, globally removing the check for ID/IDREF
compatibility seems like a bad idea. Maybe Syd could expand a little on
whatis causing 'the dreaded "conflicting ID-types" error' ?