
Hello, I think this is mostly a question for Lou, and any other who was involved in getting PureODD together. I've started converting the MEI ODD to PureODD, so I expect to find a few issues, most of which will likely be caused by my ineptitude, so apologies in advance. Here's the first obstacle I found: In MEI the datatypes of some attributes are defined in <datatype> itself. There a few cases for this: 1. the datatype is very specific to the element and it doesn't need to be re-used. Example: <attDef ident="tab.strings" usage="opt"> <desc>Provides a *written* pitch and octave for each open string or course of strings.</desc> <datatype> <rng:list> <rng:oneOrMore> <rng:data type="token"> <rng:param name="pattern"
[a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?([a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?)*</rng:param> </rng:data> </rng:oneOrMore> </rng:list> </datatype> </attDef>
Solution: bite the bullet and move it to a dedicated <dataSpec> 2. The datatype combines a number of pre-defined datatypes Example: <datatype> <rng:list> <rng:oneOrMore> <rng:data type="decimal"/> <rng:data type="decimal"/> </rng:oneOrMore> </rng:list> </datatype> Solution: again create a dedicated <dataSpec>, but it's a bit annoying since this it's just a matter of combining already defined datatypes in a specific way. 3. The datatype involved a choice between a number of pre-defined datatypes Example: <datatype> <rng:choice> <rng:data type="decimal"> <rng:param name="minInclusive">1</rng:param> </rng:data> <rng:data type="time"/> </rng:choice> </datatype> Solution: like 2., but there is also another question: we're relying on the rng datatype for time. Will we have to define our own? Can we use rng's or xsd's definition without breaking the Durand Conondrum? General question: What's the reason to not allow <content> in <datatype>? It also seems a bit strange to allow either dataRef or textNode - it seems to me that it's mixing a reference and a definition, but only if it's a text node. It would seem more logical to just replace <datatype> with <dataRef> and use it to refer to a text datatype. Or, better, just allow datatype to define the datatype there and then like it used to :) Thanks, Raff

Hi Raff Thanks for your questions about datatype. Reviewing the discussion on the pure ODD list (starting in April 2015) is probably the best way of grappling with your difficult "why" questions, so I'm going to mostly just tell you how I think you should be using pure ODD to do what you are currently doing. Firstly, you can use the @restriction attribute to supply a regexp which limits the range of legal values beyond what the indicated datatype would otherwise permit. This deals with your first example (@tab.strings) which should become simply <datatype> <dataRef key=teidata.word restriction="[a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?([a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?)*"/> </datatype> (though myself I think I'd rather supply a valList) Secondly, whenever the datatype required is "complex" (i.e. permits an alternation or sequence of different datatypes) you *must* predefine the combination required using a <dataSpec>. This is because <dataRef> is always intended to deliver an atomic, primitive, single value. If you want to allow for multiple instances of the same datatype, you can use the occurrence indicators on the parent <datatype>, of course (and that is also why we retain <datatype> as a wrapper, rather than allowing the occurrence attributes directly on <dataRef> itself, and allowing <dataRef> as an alternative to -- rather than child of -- <datatype> which is what was at one stage proposed) So your second and third examples both require an additional <dataSpec> to be defined, the <content> of which will have an <alternate> (or something) wrapping the <dataRef>s you want. For the second case, something like this should work <dataSpec ident="meidata.twoDecimals"> <content> <sequence> <dataRef name="decimal"/> <dataRef name="decimal"/> </sequence> <dataSpec> and then in the attribute <datatype maxOccurs="unlimited"> <dataRef key="meidata.twoDecimals"/> </datatype> And similarly for the third case <dataSpec ident="meidata.weird"> <content> <alternate> <dataRef name="decimal" restriction="[1-9]\.[\d+]" /> <!-- vel sim --> <dataRef name="time"/> </alternate> </dataSpec> Why do we not permit <content> inside <dataType> ? Because that would allow all sorts of nonsense e.g. <elementRef> as well. I cannot now remember why we allow <textNode> but I would certainly not recommend it. Your question also prompted me to review the current dataSpecs for <dataRef> and friends, and thus notice that these don't seem to have any pure ODD examples, which therefore needs to be rectified. On 13/05/16 23:54, Raffaele Viglianti wrote:
<attDef ident="tab.strings" usage="opt"> <desc>Provides a *written* pitch and octave for each open string or course of strings.</desc> <datatype> <rng:list> <rng:oneOrMore> <rng:data type="token"> <rng:param name="pattern"
[a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?([a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?)*</rng:param> </rng:data> </rng:oneOrMore> </rng:list> </datatype> </attDef>
Solution: bite the bullet and move it to a dedicated <dataSpec>
2. The datatype combines a number of pre-defined datatypes Example: <datatype> <rng:list> <rng:oneOrMore> <rng:data type="decimal"/> <rng:data type="decimal"/> </rng:oneOrMore> </rng:list> </datatype>
Solution: again create a dedicated <dataSpec>, but it's a bit annoying since this it's just a matter of combining already defined datatypes in a specific way.
3. The datatype involved a choice between a number of pre-defined datatypes Example: <datatype> <rng:choice> <rng:data type="decimal"> <rng:param name="minInclusive">1</rng:param> </rng:data> <rng:data type="time"/> </rng:choice> </datatype>
Solution: like 2., but there is also another question: we're relying on the rng datatype for time. Will we have to define our own? Can we use rng's or xsd's definition without breaking the Durand Conondrum?
General question: What's the reason to not allow <content> in <datatype>? It also seems a bit strange to allow either dataRef or textNode - it seems to me that it's mixing a reference and a definition, but only if it's a text node. It would seem more logical to just replace <datatype> with <dataRef> and use it to refer to a text datatype. Or, better, just allow datatype to define the datatype there and then like it used to :)
Thanks, Raff

Hi Lou, Thank you for your answers! I moved those few problematic declarations to their own datatypes, but I'm still running into a couple of problems: 1. The ODD that I created is currently not valid because empty <content/> is not allowed, but should be <http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-content.html#index-egXML-d51e145914>. See current definition: <content> <alternate> <macroRef key="macro.anyXML" /> <classRef minOccurs="0" maxOccurs="unbounded" key="model.contentPart" /> </alternate> </content> I suspect that <alternate> should have a @minOccurs="0". Right? Happy to create a Bug report issue and commit the change if that's the case. 2. I've noticed that the latest tei_odds RNG <http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_odds.rng> doesn't validate PureODD. E.g. it says element "dataSpec" not allowed anywhere. Does the ODD need to be updated? Or maybe the correct RNG just wasn't generated at release time? 3. This is the biggest issue and concerns one of the examples we discussed earlier. The model that you suggested: <dataSpec ident="meidata.twoDecimals"> <content> <sequence> <dataRef name="decimal"/> <dataRef name="decimal"/> </sequence> </content> <dataSpec> results in the RNG: <group> <data type="decimal"/> <data type="decimal"/> </group> Which Jing doesn't like: "Error: group of "string" or "data" element". The rng element that should be used in this case is <list <http://relaxng.org/tutorial-20011203.html#IDAK0YR>> instead of <group>. Does PureODD need a way to distinguish between sequences of tokens and other sequences? This could be determined by the XSLTs, but I worry it might be complicated: the script would need to resolve the chain of references to determine whether a <sequence> will end up containing only a number of dataRef[@name], in which case the right rng element to use is list instead of group. Am I making any sense? In the mean time, I'm using rng:list directly in the MEI ODD, but it would be nice to have a PureODD solution. Thanks, Raff On Sat, May 14, 2016 at 9:04 AM, Lou Burnard <lou.burnard@retired.ox.ac.uk> wrote:
Hi Raff
Thanks for your questions about datatype. Reviewing the discussion on the pure ODD list (starting in April 2015) is probably the best way of grappling with your difficult "why" questions, so I'm going to mostly just tell you how I think you should be using pure ODD to do what you are currently doing.
Firstly, you can use the @restriction attribute to supply a regexp which limits the range of legal values beyond what the indicated datatype would otherwise permit. This deals with your first example (@tab.strings) which should become simply <datatype> <dataRef key=teidata.word
restriction="[a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?([a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?)*"/> </datatype>
(though myself I think I'd rather supply a valList)
Secondly, whenever the datatype required is "complex" (i.e. permits an alternation or sequence of different datatypes) you *must* predefine the combination required using a <dataSpec>. This is because <dataRef> is always intended to deliver an atomic, primitive, single value. If you want to allow for multiple instances of the same datatype, you can use the occurrence indicators on the parent <datatype>, of course (and that is also why we retain <datatype> as a wrapper, rather than allowing the occurrence attributes directly on <dataRef> itself, and allowing <dataRef> as an alternative to -- rather than child of -- <datatype> which is what was at one stage proposed)
So your second and third examples both require an additional <dataSpec> to be defined, the <content> of which will have an <alternate> (or something) wrapping the <dataRef>s you want.
For the second case, something like this should work
<dataSpec ident="meidata.twoDecimals"> <content> <sequence> <dataRef name="decimal"/> <dataRef name="decimal"/> </sequence> <dataSpec>
and then in the attribute <datatype maxOccurs="unlimited"> <dataRef key="meidata.twoDecimals"/> </datatype>
And similarly for the third case
<dataSpec ident="meidata.weird"> <content> <alternate> <dataRef name="decimal" restriction="[1-9]\.[\d+]" /> <!-- vel sim --> <dataRef name="time"/> </alternate> </dataSpec>
Why do we not permit <content> inside <dataType> ? Because that would allow all sorts of nonsense e.g. <elementRef> as well. I cannot now remember why we allow <textNode> but I would certainly not recommend it.
Your question also prompted me to review the current dataSpecs for <dataRef> and friends, and thus notice that these don't seem to have any pure ODD examples, which therefore needs to be rectified.
On 13/05/16 23:54, Raffaele Viglianti wrote:
<attDef ident="tab.strings" usage="opt"> <desc>Provides a *written* pitch and octave for each open string or course of strings.</desc> <datatype> <rng:list> <rng:oneOrMore> <rng:data type="token"> <rng:param name="pattern"
[a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?([a-g][0-9](s|f|ss|x|ff|xs|sx|ts|tf|n|nf|ns|su|sd|fu|fd|nu|nd|1qf|3qf|1qs|3qs)?)*</rng:param>
</rng:data> </rng:oneOrMore> </rng:list> </datatype> </attDef>
Solution: bite the bullet and move it to a dedicated <dataSpec>
2. The datatype combines a number of pre-defined datatypes Example: <datatype> <rng:list> <rng:oneOrMore> <rng:data type="decimal"/> <rng:data type="decimal"/> </rng:oneOrMore> </rng:list> </datatype>
Solution: again create a dedicated <dataSpec>, but it's a bit annoying since this it's just a matter of combining already defined datatypes in a specific way.
3. The datatype involved a choice between a number of pre-defined datatypes Example: <datatype> <rng:choice> <rng:data type="decimal"> <rng:param name="minInclusive">1</rng:param> </rng:data> <rng:data type="time"/> </rng:choice> </datatype>
Solution: like 2., but there is also another question: we're relying on the rng datatype for time. Will we have to define our own? Can we use rng's or xsd's definition without breaking the Durand Conondrum?
General question: What's the reason to not allow <content> in <datatype>? It also seems a bit strange to allow either dataRef or textNode - it seems to me that it's mixing a reference and a definition, but only if it's a text node. It would seem more logical to just replace <datatype> with <dataRef> and use it to refer to a text datatype. Or, better, just allow datatype to define the datatype there and then like it used to :)
Thanks, Raff
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived

On 16/05/16 18:35, Raffaele Viglianti wrote:
1. The ODD that I created is currently not valid because empty <content/> is not allowed, but should be
As noted, this is fixed in the current source, but hasn't resulted in a new bugfixing release. Should it?
2. I've noticed that the latest tei_odds RNG <http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_odds.rng> doesn't validate PureODD. E.g. it says element "dataSpec" not allowed anywhere. Does the ODD need to be updated? Or maybe the correct RNG just wasn't generated at release time?
I don't think any that's ever been included in a release. But if you open Exemplars/tei_odds.odd and generate a schema from it, you'll get one that does allow dataSpec. At least, that's what I just did.
<dataSpec ident="meidata.twoDecimals"> <content> <sequence> <dataRef name="decimal"/> <dataRef name="decimal"/> </sequence> </content> <dataSpec>
results in the RNG:
<group> <data type="decimal"/> <data type="decimal"/> </group> Which Jing doesn't like: "Error: group of "string" or "data" element". The rng element that should be used in this case is <list <http://relaxng.org/tutorial-20011203.html#IDAK0YR>> instead of <group>.
Rats. Probably a stylesheet issue then. Hugh? BTW, after sending that response, I remembered that there is an already existing TEI datatype (teidata.point) which matches two decimal numbers. But it requires them to be separated by a comma. Would that not work for you?
participants (2)
-
Lou Burnard
-
Raffaele Viglianti