on the content of <mapping>

19 Feb 2020

      There are 21 examples of <mapping> in the _Guidelines_, and 5
different methods of representing the content thereof. (Although
three of these methods are variants of the same idea.)

The content model of <mapping> is macro.xtext, thus it can have only
character data or <g>.

Of the 21 of them, the desired mapping is represented in the content
as: 

 * straight character content in 15 (1 of those is multi-character,
   the rest are single characters);

 * numeric character references[1] in 3 (1 of those is
   multi-character, the others are single characters);

 * a <g> element in 1; and

 * the standard Unicode representation[2] in 2, i.e. they match
   "U\+[0-9A-F]{4,6}".

The first three are variations of the same theme: the *content* of
<mapping> *is* the character to which the character or glyph being
defined should be mapped (whether expressed as character data,
NCR(s), <g>(s), or some combination thereof).

The last is entirely different. I think we can all agree that
<mapping>U+00B5</mapping> means that the character or glyph currently
being defined should be mapped to a single character
   µ = µ = µ = <g ref="#micro"/>,
NOT a sequence of 6 characters
   U+00B5 = U+00B5.
(Note that in the examples in _Guidelines_, the Unicode standard
notational convention is used for all <mapping type="PUA">, and no
other types of <mapping>.)

But, as far as I know *nowhere* is this (the use of Unicode standard
notation) mentioned in the _Guidelines_.

Although I may be wrong about this, I do not think it would ever make
sense to map a <char> onto a literal string "U+0123" (or "V=3456").
Furthermore, I think use of Unicode notation is a *really* good idea,
*especially* if the character being represented is in the PUA;
furthermore I suspect that the intention of whoever wrote this in the
past was just that: Unicode standard convention should be used rather
than actually putting PUA characters in a document instance. Thus I
think the right solution is to:

 a) Have a _suggest values include_ list for mapping/@type (after
    all, we have it in the prose, sort of).

 b) Add some prose to 5.2 (#D25-20) that explains that in general
    "U+0123" notation may be used instead of "ģ" or "ģ", and
    that if the mapping is to the PUA area, this is the (vastly)
    preferred method.

 c) If anyone thinks it important to be able to map to absolutely
    arbitrary strings, then pick some value of @type (say "exact")
    for which use of "U+0123" would actually map to the string
    "U+0123".

 Notes
 -----
 [1] Of course, they occur with "&" instead of an actual
     ampersand if you are reading the source.

 [2] Per the Unicode Standard Appendix A, "Notational Conventions".

Syd Bauman

Martin Holmes

Syd Bauman

Martin Holmes

tags

participants (2)