Hi Syd, On 15-07-14 11:22 AM, Syd Bauman wrote:
Why anyone would want to convert their JSON to base64 so they can store it in their TEI file is beyond me. But I'm pretty myopic, I guess.
The main reason for storing this sort of thing in my projects would be that it's time-consuming and processor-expensive to generate on the fly.
Yes, but why store it in myDoc.tei rather than in myDoc.json? Seems like you'd be making your life harder. But that's just MHO.
Right now I have over 7,300 despatches documents in TEI, and over 7,300 parallel OAI-PMH files in a separate collection. If a despatches file changes its name or gets deleted, I have to remember to remove the parallel OAI-PMH file from both SVN and from the eXist instance; inevitably that sometimes fails to happen and they get out of sync. Life will be much simpler with <xenoData>.
We're not actually asking for binary data here; we're just asking for the possibility of non-XML data. I don't forsee any circumstances in which I'd want to include binary data. Mostly we'll be talking about existing metadata standards and transmission-friendly standards such as JSON, I think.
So you're saying that we should explicitly permit non-XML character-based formats (like JSON), but disallow binary formats (that have, by necessity, been converted to a character format, of course)?
I didn't say we should disallow them; I just don't think any of the requesters or use-cases so far has mentioned them, so I don't think it's important to worry about them. If someone wants to include binary data, wouldn't they use <binaryObject>?
How do we tell the difference? I suppose if we don't provide @encoding or some other mechanism to say how the binary data has been converted to character data we can just say we disallow it. But even JSON, if I understand correctly, is not XML-safe, as "<" and "&" are permitted as unescaped characters in a JSON string, no? I.e., "Barnes & Noble" would have to be written "Barnes & Noble" or "Barnes \u0026 Noble". Or, as Lou points out, be wrapped in a CDATA marked section.
I think we'd always choose to use a CDATA section, surely. I think it's quite unlikely that "]]>" would occur in JSON; "]]" would be very common (nested arrays), but the following angle bracket is very unlikely to occur. So most likely JSON could be used unchanged in CDATA. Cheers, Martin