another freakin' inconsistency

14 Nov 2022

      I am writing here instead of on a Github ticket as this is either a Guidelines issue (if we decide to encode consistently) or a Stylesheets issue (if we decide to present consistently). Sigh.

Right now there appear to be roughly 3700 apostrophes in P5.[1] Of those, 3490 are encoded with U+0027 and 241 with U+2019.[2,3]

Of course, many of these are inside <egXML>, and thus maybe should be left alone; and some are inside comments (or maybe PIs), and we probably don’t care at all how they are encoded. So, refining the search to exclude those, there are only 2853 apostrophes,[4] of which 2689 are U+0027 and 190 are U+2019.[5,6]

Those that are encoded as U+0027 show up as U+0027 in the HTML output of the Guidelines. I think that is a sin.

The question is, what do we want to do about this? The following are (roughly) in my order of preference:

  1.  Change all the apostrophes to U+0027 in the encoded files, and change the Stylesheets to produce U+2019 on output. Update TCW20 or whatever to match. (I.e. to tell people to use U+0027, not U+2019.)
  2.  Leave the inconsistent encoding, and don’t bother to update the documentation, but change the Stylesheets so the output is more readable.
  3.  Change all the apostrophes to U+2019 in the encoded files and update documentation to match. (I.e. to tell people to use U+2019, not U+0027.) Also need to add some checks to the build process to prevent U+0027 from slipping in.
  4.  Change all the apostrophes to U+0027 in the encoded files, but do not change the Stylesheets, thus leaving our output ugly.
  5.  Leave things alone.

commands used
[1] egrep -c "[^=]['’][a-z]" P5/p5.xml
[2] egrep -c "[^=]'[a-z]" P5/p5.xml
[3] egrep -c "[^=]’[a-z]" P5/p5.xml
[4] xsel -t -m "//t:*/text()" -v "." -n P5/p5.xml | egrep -c "['’][a-z]"
[5] xsel -t -m "//t:*/text()" -v "." -n P5/p5.xml | egrep -c "'[a-z]"
[6] xsel -t -m "//t:*/text()" -v "." -n P5/p5.xml | egrep -c "’[a-z]"
[7] Note: xsel is an alias for an xmlstarlet sel command that binds “t:” to the TEI namespace. Thus the above 3 commands ignore elements in any other namespace, particularly the TEI Examples namespace.

Bauman, Syd

Martin Holmes

tags

participants (2)