Re: [tei-council] question on <xenoData>

14 Jul 2015

      Hi Syd,

On 15-07-14 11:22 AM, Syd Bauman wrote:
...
...
...
Why anyone would want to convert their JSON to base64 so they can
store it in their TEI file is beyond me. But I'm pretty myopic, I
guess.
The main reason for storing this sort of thing in my projects would
be that it's time-consuming and processor-expensive to generate on
the fly.
Yes, but why store it in myDoc.tei rather than in myDoc.json? Seems
like you'd be making your life harder. But that's just MHO.
Right now I have over 7,300 despatches documents in TEI, and over 7,300 
parallel OAI-PMH files in a separate collection. If a despatches file 
changes its name or gets deleted, I have to remember to remove the 
parallel OAI-PMH file from both SVN and from the eXist instance; 
inevitably that sometimes fails to happen and they get out of sync. Life 
will be much simpler with <xenoData>.
...
...
We're not actually asking for binary data here; we're just asking
for the possibility of non-XML data. I don't forsee any
circumstances in which I'd want to include binary data. Mostly
we'll be talking about existing metadata standards and
transmission-friendly standards such as JSON, I think.
So you're saying that we should explicitly permit non-XML
character-based formats (like JSON), but disallow binary formats
(that have, by necessity, been converted to a character format, of
course)?
I didn't say we should disallow them; I just don't think any of the 
requesters or use-cases so far has mentioned them, so I don't think it's 
important to worry about them. If someone wants to include binary data, 
wouldn't they use <binaryObject>?
...
How do we tell the difference? I suppose if we don't provide
@encoding or some other mechanism to say how the binary data has been
converted to character data we can just say we disallow it. But even
JSON, if I understand correctly, is not XML-safe, as "<" and "&" are
permitted as unescaped characters in a JSON string, no? I.e., "Barnes
& Noble" would have to be written "Barnes & Noble" or "Barnes
\u0026 Noble". Or, as Lou points out, be wrapped in a CDATA marked
section.
I think we'd always choose to use a CDATA section, surely. I think it's 
quite unlikely that "]]>" would occur in JSON; "]]" would be very common 
(nested arrays), but the following angle bracket is very unlikely to 
occur. So most likely JSON could be used unchanged in CDATA.

Cheers,
Martin

Re: [tei-council] question on <xenoData>

Martin Holmes