Vanessa, and everyone -- I was tasked with writing up an explanation of the rub-a-dub revisited project, primarily for Vanessa, but for all, really. So here goes. background ---------- [Note -- this is intended to be a "get your head around this issue" sort of history, not a detailed highly accurate accounting. Lou and James probably know a lot more about the early history than I.] The TEI Consortium, through the TEI Council, maintains a set of stylesheets called (unimaginatively) "the stylesheets" or (not much better) "the TEI stylesheets". They were originally started (back in early 2000s?), I think, as two separate projects. The first was to have a workflow with which the TEI Guidelines, written in TEI/ODD, could be converted to HTML and print (PDF), and later ePUB, etc. The second was to provide a stylesheet for producing some semblance of readable output (HTML or PDF) from TEI Lite. It quickly became clear that these two projects had a *lot* in common, and so they were merged. And, in fact, the code base for TEI P4 and TEI P5 also had a lot in common, and they were merged, too. (We no longer support P4, though.) The main, almost sole, programmer behind this effort was the amazing Sebastian Rahtz. Sebastian was a brilliant, talented, and frighteningly fast XSLT programmer. An archaeologist by training, and an expert in the TeX and LaTeX typesetting systems, Sebastian never had any formal training in CS, at least none he would confess to me. Sebastian did not believe in internal documentation (e.g., comments) and did not care about indentation or consistency of namespace prefixes (the latter two, I suspect, because he was so brilliant he parsed XML on the fly in his head :-). He wrote these stylesheets over many years, adding features (often only a few hours after a request was made) in a somewhat willy-nilly fashion. Oh, and he started this project in XSLT1, converting to XSLT2 years later. situation --------- Due to brain cancer Sebastian stopped contributing to the stylesheets in late 2015 and died in early 2016. So now we (the Council) are left with a wonderfully powerful, modularized, parameterized, flexible set of stylesheets that are ridiculously difficult to maintain. Roughly 6 months after the gut-wrenching loss of our dear friend and colleague, the TEI Council established a group of interested parties to co-operatively educate ourselves as to how these stylesheets worked. All of TEI Council plus anyone who had ever contributed to the Stylesheets repo was invited. This group (chaired by me) has evolved from a self-education group to a Stylesheets-fixing group over time. The only people still on it are Martin Holmes, Lou Burnard, and various members of Council. One of the first things this group did was map out how the Stylesheets actually work. Some of that knowledge is represented in https://wiki.tei-c.org/index.php/Mapping_ODD_processing. desires ------- Martin and I have a (somewhat aspirational, but somewhat realistic) plan to re-write the ODD portion of the Stylesheets from scratch. That is, we would like to write a completely new code base (in XSLT 3.0) for processing the source of P5 into the P5 outputs (in PDF via XSLFO, as opposed to via LaTeX; HTML5; ePUB; etc.), and for processing customization ODDs into schemas and documentation. We do not want to rewrite the rest of the Stylesheets, i.e. those parts that transform generic TEI Lite into useful output, and do various other conversions. (These same stylesheets are the backbone of OxGarage.) However, that would take a long time, and in the meantime we still have a nearly unmaintainable mess. cleaning up -------- -- I have often been frustrated reading Sebastian's code. I am not nearly as brilliant as he was, so I like having helpful comments along the way, proper indentation, some consistent method of variable and function names, etc. I also find long-winded XSLT 1.0 constructs where a short bit of XSLT 2.0 can do the same thing quite frustrating. So every time I dive into the stylesheets, I think "this really has to be cleaned up". But usually I don't want to do any actual cleaning, in order to keep the delta between the code base before and after whatever fix I am applying as small as possible, to make it easier to double-check that it hasn't introduced new errors. There are some people a lot smarter than I who say that it is always better to clean up a code base than re-write it from scratch. One of them wrote a lovely blog post about it. See https://www.joelonsoftware.com/2002/01/23/rub-a-dub-dub/ While I am sure there are computer scientists who disagree, it is a quite compelling article. rub-a-dub 1 --------- - So I convinced Martin that we should take a crack at cleaning the Stylesheets first (it wasn't hard). I then convinced Council of the same thing (not much harder). In late 2017 Council explicitly charged me with performing a clean up operation (called "rub a dub" or "rub a dub dub" based on what Joel called it in his blog) on a single, large, part of the Stylesheets: the odd2odd program. I spent my winter break exactly a year ago working my tail off for hours on end cleaning up code in ways that should not change the output. Pretty much every time I saved the file, I kicked off a test process to make sure the output came out the same. And it did! Every time. It wasn't until I was well over halfway through rubbing and dubbing that I discovered that, due to a misunderstanding between me & Martin (no blame on Martin, I just mistook what he said), the test procedure I was using was not actually testing odd2odd! I had been comparing the wrong output files, so of course they were the same every time. Sigh. In fact, while I blissfully thought I was doing such a great job, I had in fact mucked up the output almost beyond recognition. Sigh. rub-a-dub 2 --------- - So Martin and I redoubled our efforts on getting ODD testing in the Test2 system to work properly, and are almost finished with that. (Martin presented the Test2 system at our last Stylesheets group meeting.) Once that is done (my current goal is before I leave for a conference in Texas on Wed 09 Jan), I will attack rub-a-dub of odd2odd again, probably shortly after I get back from Texas. The over-arching goal will be to end up with a version of odd2odd that does the same thing, but is *much* easier to maintain. Some of the ancillary files (like common_functions) will likely need to be changed somewhat, too. By "the same thing" I mean "have essentially the same XML tree as output". I don't care about the incidentals of serialization, and may well sprinkle the output with comments if it seems to help. The point is I intend to try to explicitly avoid "improving" the processor as I go along. The detailed tasks I expect to work on to make it more maintainable are somewhat flexible, likely to change as I go along. But the following are the kinds of things I mean: * Reasonable indentation * Adding internal documentation * Re-factoring XSLT 1.0 constructs into XSLT 2.0 * Collapsing equivalent functions, modes, and templates where appropriate. (There are several cases where the Stylesheets have 2 or even 3 versions of what appear to be almost the same function, mode, or template.) * Changing names of functions, modes, templates, and variables in order to approach consistency * Re-factor logic to avoid double negatives where reasonable This, of course, will all be carried out in a separate branch, so as not to interfere with normal bug-fixing and the late Jan release.
participants (1)
-
Syd Bauman