You may be wondering how Pure ODD is progressing... Well, all over the place, there are bits of content model like this (expressed in pure ODD) <sequence minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottomPart" /> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence> i.e. there may be nothing here at all, or there may be at least one member of the model.divBottomPart class possibly followed by some model.globals. This makes perfect sense when transformed to RelaxNG or even in XSD, but the DTD fragment we get for it is: (%model.divBottomPart;,(%model.global;)*)* which is not syntactically correct in DTD language. It needs to be ((%model.divBottomPart;),(%model.global;)*)* Alas, using the current oddtodtd, those extra parens only appear if the bare class reference is inside either a sequence or an alternation. But we have a schematron rule that says (reasonably enough) that a <sequence> must have two or more children. (We don't see this problem in the impure ODD, of course, because RelaxNG doesn't mind <group>s containing just a single pattern) This is not an isolated problem, either.... it's a favourite trick in our content models. Having spent most of today on this, I am reluctantly coming to one of the following conclusions: a) we should just abandon DTDs b) I could write a dirty hack to detect these things and post process the DTDs c) we should quietly drop the schematron rule d) someone (not me) should rewrite odd to dtd either to deal with these things properly or (more radically) to expand all parameter entities e) there is another silver bullet somewhere but i am too stupid to see it Any opinions? Recommendations?
If you can give me an example ODD that produces a broken DTD, I could have
a go at option d. Seems like something it should cope with...
On Fri, Aug 14, 2015 at 1:02 PM, Lou Burnard
You may be wondering how Pure ODD is progressing...
Well, all over the place, there are bits of content model like this (expressed in pure ODD)
<sequence minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottomPart" /> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence>
i.e. there may be nothing here at all, or there may be at least one member of the model.divBottomPart class possibly followed by some model.globals.
This makes perfect sense when transformed to RelaxNG or even in XSD, but the DTD fragment we get for it is:
(%model.divBottomPart;,(%model.global;)*)*
which is not syntactically correct in DTD language. It needs to be
((%model.divBottomPart;),(%model.global;)*)*
Alas, using the current oddtodtd, those extra parens only appear if the bare class reference is inside either a sequence or an alternation. But we have a schematron rule that says (reasonably enough) that a <sequence> must have two or more children. (We don't see this problem in the impure ODD, of course, because RelaxNG doesn't mind <group>s containing just a single pattern)
This is not an isolated problem, either.... it's a favourite trick in our content models.
Having spent most of today on this, I am reluctantly coming to one of the following conclusions: a) we should just abandon DTDs b) I could write a dirty hack to detect these things and post process the DTDs c) we should quietly drop the schematron rule d) someone (not me) should rewrite odd to dtd either to deal with these things properly or (more radically) to expand all parameter entities e) there is another silver bullet somewhere but i am too stupid to see it
Any opinions? Recommendations?
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
Thanks for the offer Hugh : I can't do so immediately because obviously you wont see anything wrong unless you've installed the "purified" specs, or a local p5.xml derived from them. Which you can't yet because I haven't checked them into the "P5-Pure" branch (because they break the build). But thinking about how to demonstrate the problem more simply led me a bit further into the maze, so I may be back anon. yours currently trundling across Galicia and therefore only intermittently online Lou On 17/08/15 16:25, Hugh Cayless wrote:
If you can give me an example ODD that produces a broken DTD, I could have a go at option d. Seems like something it should cope with...
On Fri, Aug 14, 2015 at 1:02 PM, Lou Burnard
wrote: You may be wondering how Pure ODD is progressing...
Well, all over the place, there are bits of content model like this (expressed in pure ODD)
<sequence minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottomPart" /> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence>
i.e. there may be nothing here at all, or there may be at least one member of the model.divBottomPart class possibly followed by some model.globals.
This makes perfect sense when transformed to RelaxNG or even in XSD, but the DTD fragment we get for it is:
(%model.divBottomPart;,(%model.global;)*)*
which is not syntactically correct in DTD language. It needs to be
((%model.divBottomPart;),(%model.global;)*)*
Alas, using the current oddtodtd, those extra parens only appear if the bare class reference is inside either a sequence or an alternation. But we have a schematron rule that says (reasonably enough) that a <sequence> must have two or more children. (We don't see this problem in the impure ODD, of course, because RelaxNG doesn't mind <group>s containing just a single pattern)
This is not an isolated problem, either.... it's a favourite trick in our content models.
Having spent most of today on this, I am reluctantly coming to one of the following conclusions: a) we should just abandon DTDs b) I could write a dirty hack to detect these things and post process the DTDs c) we should quietly drop the schematron rule d) someone (not me) should rewrite odd to dtd either to deal with these things properly or (more radically) to expand all parameter entities e) there is another silver bullet somewhere but i am too stupid to see it
Any opinions? Recommendations?
-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council
PLEASE NOTE: postings to this list are publicly archived
I'm thinking that since ((%model.global;))* is the same (for validation) as (%model.globa;)* that the brain-dead change for (d) is to always emit a set of parens, no? Thus we'd end up with ((%model.divBottomPart;),((%model.global;))*)* which, while ugly, should work. And I'd much prefer to generate ugly output DTDs then have ugly input Pure ODD content models. (I am discounting the idea of expanding parameter entity references as undesirable ... although we do have the source code for Carthage.) But all that said [insert whining about needing to support DTDs here]. And I don't see any silver bullets, Lou,
You may be wondering how Pure ODD is progressing...
Well, all over the place, there are bits of content model like this (expressed in pure ODD)
<sequence minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottomPart" /> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence>
i.e. there may be nothing here at all, or there may be at least one member of the model.divBottomPart class possibly followed by some model.globals.
This makes perfect sense when transformed to RelaxNG or even in XSD, but the DTD fragment we get for it is:
(%model.divBottomPart;,(%model.global;)*)*
which is not syntactically correct in DTD language. It needs to be
((%model.divBottomPart;),(%model.global;)*)*
Alas, using the current oddtodtd, those extra parens only appear if the bare class reference is inside either a sequence or an alternation. But we have a schematron rule that says (reasonably enough) that a <sequence> must have two or more children. (We don't see this problem in the impure ODD, of course, because RelaxNG doesn't mind <group>s containing just a single pattern)
This is not an isolated problem, either.... it's a favourite trick in our content models.
Having spent most of today on this, I am reluctantly coming to one of the following conclusions: a) we should just abandon DTDs b) I could write a dirty hack to detect these things and post process the DTDs c) we should quietly drop the schematron rule d) someone (not me) should rewrite odd to dtd either to deal with these things properly or (more radically) to expand all parameter entities e) there is another silver bullet somewhere but i am too stupid to see it
Any opinions? Recommendations?
I agree with Syd: the last thing we want to do is hobble Pure ODD for the sake of supporting DTDs. How about a pre-processor step that detects this Pure ODD context and wraps it in a sequence? Cheers, Martin On 15-08-24 07:38 AM, Syd Bauman wrote:
I'm thinking that since ((%model.global;))* is the same (for validation) as (%model.globa;)* that the brain-dead change for (d) is to always emit a set of parens, no? Thus we'd end up with ((%model.divBottomPart;),((%model.global;))*)* which, while ugly, should work. And I'd much prefer to generate ugly output DTDs then have ugly input Pure ODD content models.
(I am discounting the idea of expanding parameter entity references as undesirable ... although we do have the source code for Carthage.)
But all that said [insert whining about needing to support DTDs here].
And I don't see any silver bullets, Lou,
You may be wondering how Pure ODD is progressing...
Well, all over the place, there are bits of content model like this (expressed in pure ODD)
<sequence minOccurs="0" maxOccurs="unbounded"> <classRef key="model.divBottomPart" /> <classRef key="model.global" minOccurs="0" maxOccurs="unbounded"/> </sequence>
i.e. there may be nothing here at all, or there may be at least one member of the model.divBottomPart class possibly followed by some model.globals.
This makes perfect sense when transformed to RelaxNG or even in XSD, but the DTD fragment we get for it is:
(%model.divBottomPart;,(%model.global;)*)*
which is not syntactically correct in DTD language. It needs to be
((%model.divBottomPart;),(%model.global;)*)*
Alas, using the current oddtodtd, those extra parens only appear if the bare class reference is inside either a sequence or an alternation. But we have a schematron rule that says (reasonably enough) that a <sequence> must have two or more children. (We don't see this problem in the impure ODD, of course, because RelaxNG doesn't mind <group>s containing just a single pattern)
This is not an isolated problem, either.... it's a favourite trick in our content models.
Having spent most of today on this, I am reluctantly coming to one of the following conclusions: a) we should just abandon DTDs b) I could write a dirty hack to detect these things and post process the DTDs c) we should quietly drop the schematron rule d) someone (not me) should rewrite odd to dtd either to deal with these things properly or (more radically) to expand all parameter entities e) there is another silver bullet somewhere but i am too stupid to see it
Any opinions? Recommendations?
If it comes to a separate processing step, I'd prefer a post-processor that detects this and wraps the offending reference in parens. perl -pe 's/(%[A-Za-z][A-Za-z0-9.-]+;),/($1),/g;' < in.dtd > out.dtd Or, as I said, *all* of 'em. perl -pe 's,%[A-Za-z][A-Za-z0-9.-]+;,($&),g;' < in.dtd > out.dtd Of course, these make the mistake of hitting those that are in comments, too. But since the DTD file is generated output that we generate, this doesn't worry me too much.
I agree with Syd: the last thing we want to do is hobble Pure ODD for the sake of supporting DTDs.
How about a pre-processor step that detects this Pure ODD context and wraps it in a sequence?
I instinctively balk at any more mysterious PERL finding its way into our ODD processing. This can be done with XSLT -- not very elegant, to be sure, but it would keep the number of technologies in our processing chain to a minimum. I keep hearing that people want to be able to do this stuff on Windows, so an ant project with no shell callouts would be favourite. Cheers, Martin On 15-08-24 08:14 AM, Syd Bauman wrote:
If it comes to a separate processing step, I'd prefer a post-processor that detects this and wraps the offending reference in parens. perl -pe 's/(%[A-Za-z][A-Za-z0-9.-]+;),/($1),/g;' < in.dtd > out.dtd
Or, as I said, *all* of 'em. perl -pe 's,%[A-Za-z][A-Za-z0-9.-]+;,($&),g;' < in.dtd > out.dtd
Of course, these make the mistake of hitting those that are in comments, too. But since the DTD file is generated output that we generate, this doesn't worry me too much.
I agree with Syd: the last thing we want to do is hobble Pure ODD for the sake of supporting DTDs.
How about a pre-processor step that detects this Pure ODD context and wraps it in a sequence?
I tend to agree with Martin here, but on the other hand it may not be necessary to do this unclean thing. I now think the problem is less wide spread, and may simply entail a tweak to the way model.common is defined. More later (I am still on the road, but having reached Santiago de Compostela, will soon be returning to Oxford and normal internet connectivity. If anyone tells you that the rain in spain falls mainly on the plain, believe me, they are mistaken. On 24/08/15 19:31, Martin Holmes wrote:
I instinctively balk at any more mysterious PERL finding its way into our ODD processing. This can be done with XSLT -- not very elegant, toon the track of a solution... be sure, but it would keep the number of technologies in our processing chain to a minimum. I keep hearing that people want to be able to do this stuff on Windows, so an ant project with no shell callouts would be favourite.
Cheers, Martin
On 15-08-24 08:14 AM, Syd Bauman wrote:
If it comes to a separate processing step, I'd prefer a post-processor that detects this and wraps the offending reference in parens. perl -pe 's/(%[A-Za-z][A-Za-z0-9.-]+;),/($1),/g;' < in.dtd > out.dtd
Or, as I said, *all* of 'em. perl -pe 's,%[A-Za-z][A-Za-z0-9.-]+;,($&),g;' < in.dtd > out.dtd
Of course, these make the mistake of hitting those that are in comments, too. But since the DTD file is generated output that we generate, this doesn't worry me too much.
I agree with Syd: the last thing we want to do is hobble Pure ODD for the sake of supporting DTDs.
How about a pre-processor step that detects this Pure ODD context and wraps it in a sequence?
While I'm thrilled that a hack may not be needed, I do not at all agree with Martin's characterization of Perl as "mysterious". Yes, it's string processing (not node processing), but (IMHO) if you're going to do string processing Perl is one of the, if not the, least mysterious ways to do it. It is *very* widely known, built-in on most all reasonable systems, and is readily available even on Windows.[1] It is stable enough that it's extraordinarily unlikely that such a command would fail to work with a future version of Perl. And the point of putting the search-and-replace in my previous post was to point out how simple it is. And now that I think about it, even if we can re-define model.common, and thus duck this bullet ourselves, shouldn't ODD processing do the right thing for someone else who stumbles into this problem? Notes ----- [1] Although Martin's right, we can't just call it willy-nilly on Windows. But I don't have any desire to try to support building on Windows, myself.
I tend to agree with Martin here, but on the other hand it may not be necessary to do this unclean thing. I now think the problem is less wide spread, and may simply entail a tweak to the way model.common is defined. More later (I am still on the road, but having reached Santiago de Compostela, will soon be returning to Oxford and normal internet connectivity.
If anyone tells you that the rain in spain falls mainly on the plain, believe me, they are mistaken.
I instinctively balk at any more mysterious PERL finding its way into our ODD processing. This can be done with XSLT -- not very elegant, toon the track of a solution... be sure, but it would keep the number of technologies in our processing chain to a minimum. I keep hearing that people want to be able to do this stuff on Windows, so an ant project with no shell callouts would be favourite.
Hi Syd, I'm not impugning PERL per se, although I don't love it myself; I'm just suggesting that we keep the number of requirements in the toolchain to a minimum. We _must_ have Java and Saxon and Ant; we don't have to have PERL just to do a string-replace. I really would like to get away from any dependence on CLI stuff in favour of Ant, so that we really could have a platform-neutral build process. Cheers, Martin On 15-08-25 06:30 AM, Syd Bauman wrote:
While I'm thrilled that a hack may not be needed, I do not at all agree with Martin's characterization of Perl as "mysterious". Yes, it's string processing (not node processing), but (IMHO) if you're going to do string processing Perl is one of the, if not the, least mysterious ways to do it. It is *very* widely known, built-in on most all reasonable systems, and is readily available even on Windows.[1] It is stable enough that it's extraordinarily unlikely that such a command would fail to work with a future version of Perl. And the point of putting the search-and-replace in my previous post was to point out how simple it is.
And now that I think about it, even if we can re-define model.common, and thus duck this bullet ourselves, shouldn't ODD processing do the right thing for someone else who stumbles into this problem?
Notes ----- [1] Although Martin's right, we can't just call it willy-nilly on Windows. But I don't have any desire to try to support building on Windows, myself.
I tend to agree with Martin here, but on the other hand it may not be necessary to do this unclean thing. I now think the problem is less wide spread, and may simply entail a tweak to the way model.common is defined. More later (I am still on the road, but having reached Santiago de Compostela, will soon be returning to Oxford and normal internet connectivity.
If anyone tells you that the rain in spain falls mainly on the plain, believe me, they are mistaken.
I instinctively balk at any more mysterious PERL finding its way into our ODD processing. This can be done with XSLT -- not very elegant, toon the track of a solution... be sure, but it would keep the number of technologies in our processing chain to a minimum. I keep hearing that people want to be able to do this stuff on Windows, so an ant project with no shell callouts would be favourite.
Well, I'm not pushing hard for Perl per se, but my point is that as requirements go, it's not a problem. Pretty much any system that has `make` has `perl`, too. Same can't be said for Saxon or Ant. (Yes, I realize that our processing chain already requires these things, and that it would not be a good idea to try to get rid of them.) My objection, BTW, is *not* with using XSLT instead of Perl. My objection, a minor one at that, is with performing the hack on the input PureODD rather than on the output DTD. But that objection is not a show-stopper, it's just a misgiving.
I'm not impugning PERL per se, although I don't love it myself; I'm just suggesting that we keep the number of requirements in the toolchain to a minimum. We _must_ have Java and Saxon and Ant; we don't have to have PERL just to do a string-replace. I really would like to get away from any dependence on CLI stuff in favour of Ant, so that we really could have a platform-neutral build process.
Actually the Makefile *already* uses perl for some dirty mungeing at a couple of places, I think, so one more or less won't make a lot of difference. After several days banging my head on this particular brick wall, I have come round to the view that we're all doomed anyway. The perl hack has to be more sophisticated than at first thought supposed; I can't figure out how to include it in the general post-processing pipeline (as opposed to simple mungeing the Exemplars). And none of the cunning ways I've tried to add the necessary intelligence to the pure ODD input or its processing has so far worked. In Ringo's immortal words, I've got blisters on my fingers. At least I've managed to check the purified ODDs into the P5-Pure branch though, so someone else can take a look... On 30/08/15 18:02, Syd Bauman wrote:
Well, I'm not pushing hard for Perl per se, but my point is that as requirements go, it's not a problem. Pretty much any system that has `make` has `perl`, too. Same can't be said for Saxon or Ant. (Yes, I realize that our processing chain already requires these things, and that it would not be a good idea to try to get rid of them.)
My objection, BTW, is *not* with using XSLT instead of Perl. My objection, a minor one at that, is with performing the hack on the input PureODD rather than on the output DTD. But that objection is not a show-stopper, it's just a misgiving.
I'm not impugning PERL per se, although I don't love it myself; I'm just suggesting that we keep the number of requirements in the toolchain to a minimum. We _must_ have Java and Saxon and Ant; we don't have to have PERL just to do a string-replace. I really would like to get away from any dependence on CLI stuff in favour of Ant, so that we really could have a platform-neutral build process.
Hi Syd, On 15-08-30 10:02 AM, Syd Bauman wrote:
Well, I'm not pushing hard for Perl per se, but my point is that as requirements go, it's not a problem. Pretty much any system that has `make` has `perl`, too.
To be honest I feel the same way about make. I'd like to move everything to ant. It's easier for newbies to learn and understand, and it has a lot of payoffs in terms of the number of JVMs that have to get instantiated during a build process. For instance, if you call Saxon 20 times from a Makefile, you instantiate 20 JVMs; if you do it from an ant process, the same JVM is used every time. Sebastian already managed to cut a lot of time out of the build process by doing that.
Same can't be said for Saxon or Ant. (Yes, I realize that our processing chain already requires these things, and that it would not be a good idea to try to get rid of them.)
Please no. If we get rid of ant at this point, we'll be back to builds that take an hour and a half. If there were one thing guaranteed to stop me contributing to P5, it would be a move to use make and PERL more, and ant less. I think it would be madness.
My objection, BTW, is *not* with using XSLT instead of Perl. My objection, a minor one at that, is with performing the hack on the input PureODD rather than on the output DTD. But that objection is not a show-stopper, it's just a misgiving.
The resulting file would be transient and deleted at the end of the process; it would just be part of a processing chain. Cheers, Martin
I'm not impugning PERL per se, although I don't love it myself; I'm just suggesting that we keep the number of requirements in the toolchain to a minimum. We _must_ have Java and Saxon and Ant; we don't have to have PERL just to do a string-replace. I really would like to get away from any dependence on CLI stuff in favour of Ant, so that we really could have a platform-neutral build process.
On 30/08/15 18:30, Martin Holmes wrote:
My objection, BTW, is *not* with using XSLT instead of Perl. My objection, a minor one at that, is with performing the hack on the input PureODD rather than on the output DTD. But that objection is not a show-stopper, it's just a misgiving.
The resulting file would be transient and deleted at the end of the process; it would just be part of a processing chain.
I missed this suggestion earlier. Can you elaborate on what you mean? Are you suggesting that we post-process p5.xml to introduce <sequence> elements round every <classRef> and then use that version to drive dtd generation? and then throw it away? I believe that sounds like what Baldrick would call a very Cunning Plan, but the devil is in the details...
On 15-08-30 10:42 AM, Lou Burnard wrote:
On 30/08/15 18:30, Martin Holmes wrote:
My objection, BTW, is *not* with using XSLT instead of Perl. My objection, a minor one at that, is with performing the hack on the input PureODD rather than on the output DTD. But that objection is not a show-stopper, it's just a misgiving.
The resulting file would be transient and deleted at the end of the process; it would just be part of a processing chain.
I missed this suggestion earlier. Can you elaborate on what you mean?
Are you suggesting that we post-process p5.xml to introduce <sequence> elements round every <classRef> and then use that version to drive dtd generation? and then throw it away? I believe that sounds like what Baldrick would call a very Cunning Plan, but the devil is in the details...
That's exactly what I mean. If we know what needs to be inserted where (which I'm not sure we do, exactly, yet, but it could be completely figured out presumably), then we can just make a bastardized p5.xml, use it, and throw it away without anyone ever knowing. It's a bit hacky and ugly, but that's all that DTD support deserves, at this point. It should consider itself lucky to be supported at all. :-) Cheers, Martin
On 30/08/15 18:50, Martin Holmes wrote:
Are you suggesting that we post-process p5.xml to introduce <sequence> elements round every <classRef> and then use that version to drive dtd generation? and then throw it away? I believe that sounds like what Baldrick would call a very Cunning Plan, but the devil is in the details...
That's exactly what I mean. If we know what needs to be inserted where (which I'm not sure we do, exactly, yet, but it could be completely figured out presumably), then we can just make a bastardized p5.xml, use it, and throw it away without anyone ever knowing.
Cunning as a sackful of ferrets, indeed. Of course, if you have an ODD which mentions <classRefs> and it doesn't know about the necessary bastardization procedure, it won't generate valid DTDs any more, but presumably we don't care about that?
On 15-08-30 10:57 AM, Lou Burnard wrote:
On 30/08/15 18:50, Martin Holmes wrote:
Are you suggesting that we post-process p5.xml to introduce <sequence> elements round every <classRef> and then use that version to drive dtd generation? and then throw it away? I believe that sounds like what Baldrick would call a very Cunning Plan, but the devil is in the details...
That's exactly what I mean. If we know what needs to be inserted where (which I'm not sure we do, exactly, yet, but it could be completely figured out presumably), then we can just make a bastardized p5.xml, use it, and throw it away without anyone ever knowing.
Cunning as a sackful of ferrets, indeed. Of course, if you have an ODD which mentions <classRefs> and it doesn't know about the necessary bastardization procedure, it won't generate valid DTDs any more, but presumably we don't care about that?
Actually, since the ODD doesn't generate DTDs on its own (it needs an ODD processor), and the bug we're dealing with here is actually in our own ODD processor--at least, that's how I see it--then we're just providing a hack for our ODD processor which avoids the horrible prospect of any of us going into the bowels of the XSLT and figuring out how on earth it could be made sensitive to these specific contexts and process them in a special manner without screwing up other stuff. An entirely separate ODD processor, should there ever be one, would presumably be able to handle this without resorting to a trick. We really do need to understand our ODD processing much better and be able to deal with this sort of stuff when it crops up in other outputs, but I think that would be a bit of a waste of our precious time with the DTD target. We have so much else on our plate right now. Cheers, Martin
On 30/08/15 19:12, Martin Holmes wrote:
Cunning as a sackful of ferrets, indeed. Of course, if you have an ODD which mentions <classRefs> and it doesn't know about the necessary bastardization procedure, it won't generate valid DTDs any more, but presumably we don't care about that?
Actually, since the ODD doesn't generate DTDs on its own (it needs an ODD processor), and the bug we're dealing with here is actually in our own ODD processor--at least, that's how I see it--then we're just providing a hack for our ODD processor which avoids the horrible prospect of any of us going into the bowels of the XSLT and figuring out how on earth it could be made sensitive to these specific contexts and process them in a special manner without screwing up other stuff. An entirely separate ODD processor, should there ever be one, would presumably be able to handle this without resorting to a trick.
I have spent the last few weeks inspecting said bowels, poking bits of gristle here and there, Some of it is actually quite clear, but there is a lot that really isn't. But my point is that other people besides us write ODDs and maybe use our (broken) ODD processor to generate DTDs. We will have to decide whether to come clean on its limitations and the steps we've taken to circumvent then. But let's see if it actually works first.
I completely agree that ant, or some other system that creates a JVM once for all our Saxon transforms, is a Very Good Thing for our build process, and (as I said) am not suggesting we get rid of it. But I don't see ant as particularly easier for newbies, and there are a *lot* of people in the world who are not newbiews and are quite used to make. The fact that your version of this hack involves only a transiently modified p5.xml is what makes it tolerable. (I still think the *right* answer is to fix the DTD generation in the actual P5 build process so that it puts the parens in when needed. But I'm not willing to volunteer to do that, at least not yet.)
Well, I'm not pushing hard for Perl per se, but my point is that as requirements go, it's not a problem. Pretty much any system that has `make` has `perl`, too.
To be honest I feel the same way about make. I'd like to move everything to ant. It's easier for newbies to learn and understand, and it has a lot of payoffs in terms of the number of JVMs that have to get instantiated during a build process. For instance, if you call Saxon 20 times from a Makefile, you instantiate 20 JVMs; if you do it from an ant process, the same JVM is used every time. Sebastian already managed to cut a lot of time out of the build process by doing that.
Same can't be said for Saxon or Ant. (Yes, I realize that our processing chain already requires these things, and that it would not be a good idea to try to get rid of them.)
Please no. If we get rid of ant at this point, we'll be back to builds that take an hour and a half. If there were one thing guaranteed to stop me contributing to P5, it would be a move to use make and PERL more, and ant less. I think it would be madness.
My objection, BTW, is *not* with using XSLT instead of Perl. My objection, a minor one at that, is with performing the hack on the input PureODD rather than on the output DTD. But that objection is not a show-stopper, it's just a misgiving.
The resulting file would be transient and deleted at the end of the process; it would just be part of a processing chain.
Hi Syd, On 15-08-30 05:14 PM, Syd Bauman wrote:
I completely agree that ant, or some other system that creates a JVM once for all our Saxon transforms, is a Very Good Thing for our build process, and (as I said) am not suggesting we get rid of it. But I don't see ant as particularly easier for newbies, and there are a *lot* of people in the world who are not newbiews and are quite used to make.
We need ant; we don't need make. Ant can do everything make can do, but the reverse is not the case. Ant build files are written in XML, which is our native language (for the foreseeable future, at any rate), while Makefiles look a bit like bash scripts but not really. I bet you anyone who understands XML stuff even at a basic level could read an ant build file and tell you what it does. I'm quite used to make, and I still don't like it. But a lot of this is about familiarity and personal preference.
The fact that your version of this hack involves only a transiently modified p5.xml is what makes it tolerable. (I still think the *right* answer is to fix the DTD generation in the actual P5 build process so that it puts the parens in when needed. But I'm not willing to volunteer to do that, at least not yet.)
I agree with you completely. Especially the not volunteering bit. :-) Cheers, Martin
Well, I'm not pushing hard for Perl per se, but my point is that as requirements go, it's not a problem. Pretty much any system that has `make` has `perl`, too.
To be honest I feel the same way about make. I'd like to move everything to ant. It's easier for newbies to learn and understand, and it has a lot of payoffs in terms of the number of JVMs that have to get instantiated during a build process. For instance, if you call Saxon 20 times from a Makefile, you instantiate 20 JVMs; if you do it from an ant process, the same JVM is used every time. Sebastian already managed to cut a lot of time out of the build process by doing that.
Same can't be said for Saxon or Ant. (Yes, I realize that our processing chain already requires these things, and that it would not be a good idea to try to get rid of them.)
Please no. If we get rid of ant at this point, we'll be back to builds that take an hour and a half. If there were one thing guaranteed to stop me contributing to P5, it would be a move to use make and PERL more, and ant less. I think it would be madness.
My objection, BTW, is *not* with using XSLT instead of Perl. My objection, a minor one at that, is with performing the hack on the input PureODD rather than on the output DTD. But that objection is not a show-stopper, it's just a misgiving.
The resulting file would be transient and deleted at the end of the process; it would just be part of a processing chain.
On 24/08/15 16:14, Syd Bauman wrote:
If it comes to a separate processing step, I'd prefer a post-processor that detects this and wraps the offending reference in parens. perl -pe 's/(%[A-Za-z][A-Za-z0-9.-]+;),/($1),/g;' < in.dtd > out.dtd
This (or my somewhat simpler version of it) does work, at least for the one dtd I have tried it on so far.
Or, as I said, *all* of 'em. perl -pe 's,%[A-Za-z][A-Za-z0-9.-]+;,($&),g;' < in.dtd > out.dtd
Hitting all of them (however) is definitely not a good idea: it generates a different error in cases where the pe expands to a single gi.
Interesting ... what is the error, and from what processor? My reading of the XML spec (in particular, production 50) says it should be legal except in mixed content (production 51), where the parens would not be allowed no matter what the PE expands to.
perl -pe 's,%[A-Za-z][A-Za-z0-9.-]+;,($&),g;' < in.dtd > out.dtd
Hitting all of them (however) is definitely not a good idea: it generates a different error in cases where the pe expands to a single gi.
The processor is xmllint , but I don't remember the details of the error message. On 30/08/15 18:11, Syd Bauman wrote:
Interesting ... what is the error, and from what processor? My reading of the XML spec (in particular, production 50) says it should be legal except in mixed content (production 51), where the parens would not be allowed no matter what the PE expands to.
perl -pe 's,%[A-Za-z][A-Za-z0-9.-]+;,($&),g;' < in.dtd > out.dtd
Hitting all of them (however) is definitely not a good idea: it generates a different error in cases where the pe expands to a single gi.
participants (4)
-
Hugh Cayless
-
Lou Burnard
-
Martin Holmes
-
Syd Bauman