pattern for ANY in RELAX NG

27 Mar 2017

      Kudos to George Bina, who actually simply asked James Clark and
Makoto Murata about the "Conflicting ID-types" problem.

All we need to do is change the definition of the anyXML pattern so
that an @xml:id that it matches that is already defined is declared
as ID, not text! Gads, why didn't I think of that?

The problem is this is not all that easy to do. The idea isn't so
hard to grok, but implementation may be a tall order. Pretend we have
a schema that defines only 3 elements. The root <test>, which has one
or more <para>s, each of which can have text or <name>; of course,
any element in our system can have an @xml:id:

 start = element test {
   attribute xml:id { xsd:ID }?,
   para+
   }
 para = element para {
   attribute xml:id { xsd:ID }?,
   ( text | name )*
   }
 name = element name {
   attribute xml:id { xsd:ID }?,
   ("Horslips" | "Heart" | "Berlin" | "Blondie" | "Quarterflash" | "Renaissance")
   }

That's a lovely working schema.[1] Let's say I also want to allow a
single element <otherStuff> to precede the paragraphs, and allow it
to have ANY content. Easy, with this method:

 start = element test {
   attribute xml:id { xsd:ID }?,
   otherStuff?,
   para+
   }
 para = element para {
   attribute xml:id { xsd:ID }?,
   ( text | name )*
   }
 name = element name {
   attribute xml:id { xsd:ID }?,
   ("Horslips" | "Heart" | "Berlin" | "Blondie" | "Quarterflash" | "Renaissance")
   }
 otherStuff = element otherStuff {
   attribute xml:id { xsd:ID }?,
   ( text | anyXMLelement )*
   }
 anyXMLelement =
   # for elements that already have @xml:id defined as ID, define it
   # as ID here, too
   element   test | otherStuff | name | para   {
     attribute * - xml:id { text }*,
     attribute xml:id { xsd:ID }?,
     ( text | anyXMLelement )*
     }
   |
   # for all other elements, define @xml:id (and all other attrs) as
   # just text
   element * - (  test | otherStuff | name | para  ) {
     attribute * { text }*,
     ( text | anyXMLelement )*
     }

This is also a lovely working schema. It has the HUGE advantage that
it does not trip over the "Conflicting ID-types" error. It has the
(minor) disadvantage that for some elements inside <otherStuff>
(namely, those that are NOT the enumerated <test>, <otherStuff>,
<name>, <para>) @xml:id values are not required to be NCNames, and
uniqueness of @xml:id values is not enforced. BUT those same problems
occur with our current method (and our previous method) of doing
this, and even more so as with those methods it applies to *all*
descendant elements.

The main problem, as I see it, is that it is hard to create the
schema. That list of names (here "test | otherStuff | name | para")
cannot be handled with indirection, because (AFAIK) an <rng:name>
cannot be inside an <rng:define>. It has to be built at schema-build
time, and then tucked into both places. And while building it you
have to remember to include namespaces where necessary (for us that
is only for "teix:egXML").[2]

Another monkey wrench is that to build the list you not only have to
go through all of the <tei:elementSpec>s in your flattened ODD, but
also all the <rng:element>s in any schemas included by a
  <tei:moduleRef url="[MathML, SVG, whatever]"/>
(And no, I have not thought through whether you want *all* element
names, or only those that actually occur (perhaps indirectly via a
class) in a content model, or if it matters.)

I don't know how to do any of that off the top of my head, but my
thought is that it is probably not too hard, albeit a significant
amount of work.

To see what a resulting tei_all.rnc would look like, feel free to
take a look at [3], particularly the last dozen lines.

Notes
-----
 [1] Working because the prefixes "xml:" and "xsd:" are magically defined.

 [2] And, if you're writing in the compact syntax (which we wouldn't
     be), to put a backslash in front of those names RELAX NG already
     uses (for us that's \list, \namespace, \default, \text, and
     \div)
 [3] http://paramedic.wwp.northeastern.edu/~syd/temp/TEI_Council/tei_all_new_ANY_...

Syd Bauman

tags

participants (1)