With apologies to Syd, I've only got around to reading this now. It seems a good description to me. I'd add that before he stopped working on the Stylesheets Sebastian did recognise that although he'd moved it to be XSLT2 technically much of the way it worked was still with XSLT1 constructs. The one I discussed most with him was the interminable call-templates constructions that could be simplified significantly with functions. (And even where the same depths of constructs, it would be clearer.)
Many thanks for all your hard work,
James
--
Dr James Cummings, James.Cummings@newcastle.ac.uk
Senior Lecturer in Late-Medieval Literature and Digital Humanities
School of English, Newcastle University
________________________________
From: Tei-council on behalf of Syd Bauman
Sent: 24 December 2018 17:17:42
To: TEI Council
Cc: Lou Burnard; Martin Holmes
Subject: [Tei-council] rub-a-dub revisited
Vanessa, and everyone --
I was tasked with writing up an explanation of the rub-a-dub
revisited project, primarily for Vanessa, but for all, really. So
here goes.
background
----------
[Note -- this is intended to be a "get your head around this issue"
sort of history, not a detailed highly accurate accounting. Lou and
James probably know a lot more about the early history than I.]
The TEI Consortium, through the TEI Council, maintains a set of
stylesheets called (unimaginatively) "the stylesheets" or (not much
better) "the TEI stylesheets". They were originally started (back in
early 2000s?), I think, as two separate projects. The first was to
have a workflow with which the TEI Guidelines, written in TEI/ODD,
could be converted to HTML and print (PDF), and later ePUB, etc. The
second was to provide a stylesheet for producing some semblance of
readable output (HTML or PDF) from TEI Lite. It quickly became clear
that these two projects had a *lot* in common, and so they were
merged. And, in fact, the code base for TEI P4 and TEI P5 also had a
lot in common, and they were merged, too. (We no longer support P4,
though.)
The main, almost sole, programmer behind this effort was the amazing
Sebastian Rahtz. Sebastian was a brilliant, talented, and
frighteningly fast XSLT programmer. An archaeologist by training, and
an expert in the TeX and LaTeX typesetting systems, Sebastian never
had any formal training in CS, at least none he would confess to me.
Sebastian did not believe in internal documentation (e.g., comments)
and did not care about indentation or consistency of namespace
prefixes (the latter two, I suspect, because he was so brilliant he
parsed XML on the fly in his head :-). He wrote these stylesheets
over many years, adding features (often only a few hours after a
request was made) in a somewhat willy-nilly fashion.
Oh, and he started this project in XSLT1, converting to XSLT2 years
later.
situation
---------
Due to brain cancer Sebastian stopped contributing to the stylesheets
in late 2015 and died in early 2016. So now we (the Council) are left
with a wonderfully powerful, modularized, parameterized, flexible set
of stylesheets that are ridiculously difficult to maintain.
Roughly 6 months after the gut-wrenching loss of our dear friend and
colleague, the TEI Council established a group of interested parties
to co-operatively educate ourselves as to how these stylesheets
worked. All of TEI Council plus anyone who had ever contributed to
the Stylesheets repo was invited. This group (chaired by me) has
evolved from a self-education group to a Stylesheets-fixing group
over time. The only people still on it are Martin Holmes, Lou
Burnard, and various members of Council.
One of the first things this group did was map out how the
Stylesheets actually work. Some of that knowledge is represented in
https://wiki.tei-c.org/index.php/Mapping_ODD_processing.
desires
-------
Martin and I have a (somewhat aspirational, but somewhat realistic)
plan to re-write the ODD portion of the Stylesheets from scratch.
That is, we would like to write a completely new code base (in XSLT
3.0) for processing the source of P5 into the P5 outputs (in PDF via
XSLFO, as opposed to via LaTeX; HTML5; ePUB; etc.), and for
processing customization ODDs into schemas and documentation. We do
not want to rewrite the rest of the Stylesheets, i.e. those parts
that transform generic TEI Lite into useful output, and do various
other conversions. (These same stylesheets are the backbone of
OxGarage.)
However, that would take a long time, and in the meantime we still
have a nearly unmaintainable mess.
cleaning up
-------- --
I have often been frustrated reading Sebastian's code. I am not
nearly as brilliant as he was, so I like having helpful comments
along the way, proper indentation, some consistent method of variable
and function names, etc. I also find long-winded XSLT 1.0 constructs
where a short bit of XSLT 2.0 can do the same thing quite
frustrating.
So every time I dive into the stylesheets, I think "this really has
to be cleaned up". But usually I don't want to do any actual
cleaning, in order to keep the delta between the code base before and
after whatever fix I am applying as small as possible, to make it
easier to double-check that it hasn't introduced new errors.
There are some people a lot smarter than I who say that it is always
better to clean up a code base than re-write it from scratch. One of
them wrote a lovely blog post about it. See
https://www.joelonsoftware.com/2002/01/23/rub-a-dub-dub/
While I am sure there are computer scientists who disagree, it is
a quite compelling article.
rub-a-dub 1
--------- -
So I convinced Martin that we should take a crack at cleaning the
Stylesheets first (it wasn't hard). I then convinced Council of the
same thing (not much harder). In late 2017 Council explicitly charged
me with performing a clean up operation (called "rub a dub" or "rub a
dub dub" based on what Joel called it in his blog) on a single,
large, part of the Stylesheets: the odd2odd program.
I spent my winter break exactly a year ago working my tail off for
hours on end cleaning up code in ways that should not change the
output. Pretty much every time I saved the file, I kicked off a test
process to make sure the output came out the same. And it did! Every
time. It wasn't until I was well over halfway through rubbing and
dubbing that I discovered that, due to a misunderstanding between me
& Martin (no blame on Martin, I just mistook what he said), the test
procedure I was using was not actually testing odd2odd! I had been
comparing the wrong output files, so of course they were the same
every time. Sigh. In fact, while I blissfully thought I was doing
such a great job, I had in fact mucked up the output almost beyond
recognition. Sigh.
rub-a-dub 2
--------- -
So Martin and I redoubled our efforts on getting ODD testing in the
Test2 system to work properly, and are almost finished with that.
(Martin presented the Test2 system at our last Stylesheets group
meeting.) Once that is done (my current goal is before I leave for a
conference in Texas on Wed 09 Jan), I will attack rub-a-dub of
odd2odd again, probably shortly after I get back from Texas.
The over-arching goal will be to end up with a version of odd2odd
that does the same thing, but is *much* easier to maintain. Some of
the ancillary files (like common_functions) will likely need to be
changed somewhat, too.
By "the same thing" I mean "have essentially the same XML tree as
output". I don't care about the incidentals of serialization, and may
well sprinkle the output with comments if it seems to help. The point
is I intend to try to explicitly avoid "improving" the processor as I
go along.
The detailed tasks I expect to work on to make it more maintainable
are somewhat flexible, likely to change as I go along. But the
following are the kinds of things I mean:
* Reasonable indentation
* Adding internal documentation
* Re-factoring XSLT 1.0 constructs into XSLT 2.0
* Collapsing equivalent functions, modes, and templates where
appropriate. (There are several cases where the Stylesheets have 2
or even 3 versions of what appear to be almost the same function,
mode, or template.)
* Changing names of functions, modes, templates, and variables in
order to approach consistency
* Re-factor logic to avoid double negatives where reasonable
This, of course, will all be carried out in a separate branch, so as
not to interfere with normal bug-fixing and the late Jan release.
_______________________________________________
Tei-council mailing list
Tei-council@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/tei-council