There are 3 different processes (that I am immediately aware of) for
getting the ISO Schematron constraints from an ODD:
1) teitoschematron
2) generate a RELAX NG schema
3) extract the constraints
My homework assignment, i.e. this post, is supposed to be about
(3). However, it is worth noting that, through a process similar to
the first few steps Martin outlined in homework #1, (1) just calls the
same routine as (3). (However, it does not merge a customization ODD
with P5 source ODD before processing. It probably should.)
So here goes. The big-picture extraction process is *really* simple. A
single stylesheet, Stylesheets/odds/extract-isosch.xsl, is run with an
ODD file as input; it produces ISO Schematron as output. To be most
useful, the input should be the "merged" ODD file created when a
customization is applied to the P5 source. The output is a (mostly)
usable file with the (hopefully) correct ISO Schematron code.
But the devil is in the details ... this is pretty complicated stuff.
"Why is it complicated?" you ask; "wouldn't just yanking everything
that is in the ISO Schematron namespace out and copying it over do the
trick?" Turns out that won't work, for several reasons, two of which
I'll mention here. First, because the rules of <constraintSpec> say
that if it is in (e.g.) an <elementSpec>, you don't have to specify a
@context on an <sch:rule>, rather the context is assumed to be the
element being defined; so this extraction process has to build an
<sch:rule> with the right @context. Second, because some of the
constraints are not expressed in <constraintSpec>, but rather are
expressed by @validUntil.
But that said, the basic underlying process is, indeed, to find all
the stuff in the ISO Schematron namespace and copy it to the
output. And therein lies the first problem.
In a TEI ODD, the <schemaSpec> element is (perhaps unwisely)
repeatable.[1] The `roma` commandline tool only processes the first
<schemaSpec> that is encountered (in document order). I think
odd2odd.xsl does the same thing (unless a different <schemaSpec> is
specified via the 'selectedScema' parameter).[2] This leads to a
potential mis-match: if a single ODD file defines two schemas, the
Schematron from both of them will be extracted by extract-isosch.xsl,
whereas the rest of the schema and custom documentation will be built
only from the first schema defined.
This is not as big of a problem is it sounds. First, nobody (that I
know of) actually puts two <schemaSpec>s in a single file. Second, the
extract-isosch.xsl program is typically run on an ODD that has already
been "merged". So even if the input to the merge process had two or
more <schemaSpec>s, the output has only one.
So what does extract-isosch.xsl do? A lot, actually. Here's an
overview with some details thrown in.
1) On matching the root of the input ODD, the entire input tree is
processed in two passes. This template is found circa line 129.
2) Pass #1 "decorates" the input tree with extra attributes about the
namespaces. That is, pass #1 (which is mode "NSdecoration") is an
identity transform EXCEPT that each <attDef>, <elementSpec>, and
<schemaSpec> gets two new attributes:
@nsu (namespace URI) = the URI of the namespace of the construct
being defined; the default is the TEI namespace for elements
and no namespace for attributes.
@nsp (namespace prefix) = a prefix for use with the @nsu. An
intelligent one is chosen heuristically if possible; if not one
is just created out of thin air.
3) The results of pass #1 are then processed in pass #2, which starts
out in mode "schematron-extraction".[3] A skeleton of the output
schema is spit out with 6 major sections:
* namespaces, declared
* namespaces, implicit
* keys (including problematic ones)
* the constraints themselves
* deprecations
* paramLists
4) Namespaces, declared. All <sch:ns> elements (except those inside
an <egXML>) that are in the right language and do not have a
prefix of "xsl" are copied over to the output. (At the moment, I
cannot think of why the XSLT namespace is exempted, or moreover,
why it is found by testing the prefix, not the URI.)
5) Namespaces, implicit. All distinct namespaces that we calculated
in pass 1 are converted to <sch:ns> elements and copied into the
output. Again, the XSLT namespace is exempted, this time by URI.
6) Keys. All <xsl:key> elements (except those inside an <egXML>) that
are in the right language are processed. (Said processing is that
they are copied over to the output.)
7) A warning message is generated if any <sch:key> elements are
encountered. (Because there is no <sch:key> element. :-)
8) Constraints. Every <constraint> (except those inside an <egXML>)
that is in the right language, and is in a <constraintSpec
scheme=isoschematron>[5] is processed as follows.
a) if there is a child <sch:pattern>, the children (of the
<constraint>) are processed
b) if there is a child <sch:rule>, a wrapper <sch:pattern> (with a
generated @id) is output and children (of the constraint) are
processed and the output of that is put within the
<sch:pattern>.
c) if there is a child <sch:assert> or <sch:report>, both a wrapper
<sch:pattern> (with a generated @id) and its child <sch:rule>
(with a generated @context) are generated, and children are
processed within.
9) Deprecations. The code is only set up (at the moment) to handle
cases of @validUntil that actually occur in the _Guidelines_ at
the time said code was written. (E.g., there is code to handle
elementSpec//attDef/@validUntil, but not constraintSpec/
@validUntil, because even though it is valid, there are no
<constraintSpec>s with @validUntil in the GLs.) The list is as
follows; note that @validUntil inside an <egXML> is always
ignored.
* elementSpec//attDef/@validUntil
* classSpec//attDef/@validUntil
* elementSpec/@validUntil
* elementSpec//valItem/@validUntil[6]
10) Deprecation, elementSpec//attDef. An <sch:pattern> with a child
<sch:rule> whose @context is set to the element being defined by
the <elementSpec> is output. (Luckily, we put an attribute on
that <elementSpec> back in pass 1 that gives us its namespace
prefix.) That <sch:rule> has a child <sch:report> that fires
whenever the attribute being specified is found (again, we can
use the namespace prefix we ascertained in pass 1).[7]
11) Deprecation, classSpec//attDef. Same idea as above, except that
developing the @context is harder because it might be any element
that is a member of the class. Note that this code only searches
for elements that are members of the class being defined -- it
does NOT look for elements that are members of a class that is a
member of the class being defined, or members of a class that is
a member of a class that is a member of the class being
defined.[8]
12) Deprecation, elementSpec. An sch:pattern/sch:rule/sch:report is
output. The <sch:rule> has a @context set to the element being
defined. (Again, using the namespace prefix inserted in pass 1.)
The report has a @test set to the string (not the function)
"true()". (Thus, when interpreted by Schematron, it will be the
function "true()".) So the <sch:report> fires whenever the
condition on its parent sch:rule/@context is met.
13) Deprecation, elementSpec//valItem. Similar to the above, but
instead of a @test of "true()", the test is for the specific
value being defined.
14) Parameter lists. I really don't understand this well, in large
part because I've never seen any output generated from it. (Even
building tei_simple does not fire this code.) But it looks to me
like we will get an <sch:pattern> (with a nice @id) that has a
child <sch:rule> that fires on <param> elements that are inside a
<model> whose @behaviour is the one being currently defined. That
<sch:rule> will have a child <sch:assert> that tests that the
@name (of the <param>) matches one of the names being defined in
this <paramList>.
Worth noting that in cases where the <constraint> is inside an
<attDef>, the value of the resulting sch:rule/@context is an
attribute. I do not see why this should be a problem, nor I have found
anything in ISO 19757-3:2006 that suggests it should be a
problem. However, I know at least 1 Schematron processor fails to work
correctly in this case.
Notes
-----
[1] Bizarrely, it is model.divPart; I would have thought that, if
anything, it would be in model.divLike.
[2] This means, I think, that the work of extracting the @ident of
the first <schemaSpec> is performed twice when you run `roma`.
[3] I have to admit, I don't fully understand why that template,
matching "/" in mode "schematron-extraction", fires. The content
of the variable that is selected by the <apply-templates> is all
of the nodes that are children of the document node, not the
document node itself. I also am not quite sure why we pass the
<TEI> element in as a parameter, rather than just set a variable
from within the template.
[4] The result of the two steps to generate <sch:ns> namespace
declarations for the Schematron may well be that a single
namespace URI may be bound to two occurrences of the same prefix,
or to two or more different prefixes. I do not see why this
should be a problem, nor I have found anything in ISO
19757-3:2006 that suggests it should be a problem. However, I am
aware of at least 1 Schematron processor that gets very upset by
occurrences of <sch:ns> that have either the same @prefix or the
same @uri as another occurrence of <sch:ns>.
[5] Uh-oh. That should now be changed to process those with
scheme=schematron, too.
[6] The complete set of cases that exist in the _Guidelines_ is:
1 classSpec/attList/attDef/defaultVal/@validUntil
2 elementSpec/@validUntil
4 elementSpec/attList/attDef/@validUntil
4 elementSpec/attList/attDef/defaultVal/@validUntil
28 macroSpec/@validUntil
Martin & I don't think there is anything to be done for the
<defaultVal> cases (after all, a processor would not know if the
value was specified or defaulted, which is why we don't like them
in the first place! :-) So the only case that is not handled but
perhaps should be is the most common: <macroSpec>.
[7] I just noticed that although we test for the attribute correctly
using the namespace prefix, the message includes just the
attribute name -- it does not include the namespace prefix. Since
in 99% of all cases, and in 100% of actual current cases, there
is no namespace prefix, this doesn't matter much. Nonetheless, it
should probably be fixed.
[8] I just noticed that both the definition of and reference to
$fqgis (which stands for "fully qualified generic identifiers", I
believe) are separated with union operators (aka "or bars",
'|'). That is probably an error that won't cause a problem,
because (I am guessing) by the time it is referenced it is
already a single string, so the @separator has no effect. Also
the @test of this report does not use the namespace prefix;
probably should.
--
Syd Bauman, EMT-Paramedic
Senior XML Programmer/Analyst
Northeastern University Women Writers Project
s.bauman(a)northeastern.edu or
Syd_Bauman(a)alumni.Brown.edu