I very much like the idea of an attribute on which the Unicode
standard notation would be used. I do not like the idea of just using
NCRs, because (as far as an XML processor is concerned), that's the
same as using characters, and one should not have characters from the
PUA in a document prepared for interchange.
If we went with an attribute, it brings up a few questions:
* What would it be named?
* What would be used to separate multiple characters? (I think I
lean towards whitespace, meaning one would have to use "U+0020" if
one really wanted a space.)
* Would it be mutually exclusive with content?
Using whitespace to separate would allow us to set up a datatype for
"U\+[0-9A-F]{4,6}", and then have the attribute be minOccurs=1
maxOccurs=unbounded of that datatype, meaning it would be whitespace
separated. Yay!
As for being mutually exclusive with content, there are advantages
and disadvantages. Allowing
<mapping usn="U+00B5">µ</mapping>
gives the user both an extra place to make a mistake (by having
"[" by accident) and an extra way of validating that everything
is alright.
Note-to-self
------------
A proof-of-concept for testing that @usn matches content (done in
XSLT, not Schematron, and does not work on multiple characters):
--------- begin xslt snippet ---------
This is something I've wondered about in the past myself. I think, though, that if the content of the element is to be allowed to include "straight character content", then it shouldn't also allow the standard Unicode representation; I'd rather have an attribute where that could be provided. Either that or in place of U+00B5, we should insist on µ (i.e. a numeric character entity reference), which will sit more easily in a context where character content also appears.