question on <xenoData>

Syd Bauman

4 Jul 2015 4 Jul '15

4:12 p.m.

First, <xenoData> should be a member of att.declarable. I think that's pretty much a no-brainer, but raise it here in case someone wants to object. But my next issue is at least non-obvious, if not controversial (and I hope it's not). How is a process to know whether the metadata supplied in <xenoData> applies to the TEI document or to the source thereof? Several possibilities jump to mind. 1) It doesn't. This is not our problem. Any project using <xenoData> will know what metadata they are using, so they'll figure it out just fine. 2) We add a new attribute to <xenoData>, @describes. Its possible values are "source" and "transcription" or some such. 3) We use a special @type (with values "source" and "transcription" or some such) for this, even though it's not quite right. 4) We add <xenoData> to att.typed and let projects come up with their own values of @type to assert this information. Maybe with suggested values (4a), maybe not (4b). 5) Instead of making <xenoData> a member of model.teiHeaderPart, and thus a younger sibling of <fileDesc>, we allow <xenoData> to occur as a child of <fileDesc> or as a child of <sourceDesc> (via membership in model.sourceDescPart). In the former case the metadata in <xenoData> is understood to apply to the TEI document; in the latter case the metadata in <xenoData> is understood to apply to the source from which the TEI document was derived. 6) Like (5), but leave <xenoData> as a member of model.teiHeaderPart instead of (6a) or in addition to (6b) a direct child of <fileDesc>. Personally, I'm leaning very strongly towards (2), (5), or (6a). That is, I think (1), (3), (4a), and (4b) are bad ideas, and prefer (6a) to (6b). That said, I'm open to thoughts, ideas, and suggestions.

Show replies by date

Kevin Hawkins

4 Jul 4 Jul

5:40 p.m.

New subject: [tei-council] question on <xenoData>

All, Since I created the ticket that prompted this ( https://sourceforge.net/p/tei/feature-requests/453/ ), I hope you'll forgive me for inserting myself into your discussion. I should first say that, in general, I think it's a good idea to summarize Council deliberations of this sort on the tickets themselves in order to get input from the person who submitted the ticket. You don't have to take the submitter's advice, but it may save you trouble in the future in case you end up implementing it in a way that doesn't actually meet their needs. Anyway, back to the matter at hand. The Guidelines have always been careful to distinguish between bibliographic information about the electronic document and about the source of that document, making it clear that of the children of <fileDesc>, only the contents of <sourceDesc> are meant to describe the source. As for other components of the header, it's implied that <encodingDesc> and <revisionDesc> describe the electronic document as well, whereas <profileDesc> is unfortunately ambiguous. It seems plausible to me that someone would want to provide outside metadata describing either the electronic document or the source document, and the Guidelines should provide a way for them to describe either. While we could be ambiguous about which is being described (as with <profileDesc> and Syd's option 1), I think we can all agree that it's better to be explicit. You could create @describes or use @type (options 2-4), and if so, I think you should add it to <profileDesc> as well. Still, this doesn't feel right to me. The various descendents of <fileDesc>, <sourceDesc>, <bibl>, <biblStruct>, and <biblFull> are used to describe, variously, the electronic document, the source document, or another bibliographic item mentioned in the document, and you only know which of these based on the ancestor element. To follow this practice, I think it would make sense to allow <xenoData> both as a child of <fileDesc> and of <sourceDesc> -- that is, option 5. But then the question remains whether only allowing <xenoData> to be used to provide *bibliographic* metadata (what's found in <fileDesc> and again in its child <sourceDesc>) is really appropriate. While the Dublin Core, MODS, and Creative Commons examples at http://paramedic.wwp.neu.edu/~syd/temp/TEI_Council/P5-for-xenoData/Guideline... are all purely bibliographic, I'm not sure we can exclude the possibility that someone would want to include external non-bibliographic metadata -- that is, the sorts of data found in <encodingDesc>, <profileDesc>, and <revisionDesc>. If we follow this train of thought, we'd want to allow <xenoData> to be included *anywhere* in the header so that it would be clear from the parent or ancestor element what the data it contains describes. However, this gets away from the rationale given in http://paramedic.wwp.neu.edu/~syd/temp/TEI_Council/P5-for-xenoData/Guideline... -- that it's "easier for humans to manage these elements if they are grouped together in a single location". For that reason, I now reluctantly support options 2, 3, or 4a. (I don't support 4b because I think suggested values are always better than not suggesting any values.) I do like the idea of maintaining the TEI's very basic distinction between a description of the electronic document a a description of the source document, so I think it would be good to make this explicit and, as above, to implement a similar solution for <profileDesc>. Furthermore, I recommend giving an example in the element spec where there's more than one <xenoData> element, each with a different @describes or @type, showing that you might include both metadata about the electronic document and about the source document but that you would be careful to put the metadata in separate <xenoData> elements. Kevin On 7/4/15 9:12 AM, Syd Bauman wrote:

...

First, <xenoData> should be a member of att.declarable. I think that's pretty much a no-brainer, but raise it here in case someone wants to object.

But my next issue is at least non-obvious, if not controversial (and I hope it's not). How is a process to know whether the metadata supplied in <xenoData> applies to the TEI document or to the source thereof? Several possibilities jump to mind.

1) It doesn't. This is not our problem. Any project using <xenoData> will know what metadata they are using, so they'll figure it out just fine.

2) We add a new attribute to <xenoData>, @describes. Its possible values are "source" and "transcription" or some such.

3) We use a special @type (with values "source" and "transcription" or some such) for this, even though it's not quite right.

4) We add <xenoData> to att.typed and let projects come up with their own values of @type to assert this information. Maybe with suggested values (4a), maybe not (4b).

5) Instead of making <xenoData> a member of model.teiHeaderPart, and thus a younger sibling of <fileDesc>, we allow <xenoData> to occur as a child of <fileDesc> or as a child of <sourceDesc> (via membership in model.sourceDescPart). In the former case the metadata in <xenoData> is understood to apply to the TEI document; in the latter case the metadata in <xenoData> is understood to apply to the source from which the TEI document was derived.

6) Like (5), but leave <xenoData> as a member of model.teiHeaderPart instead of (6a) or in addition to (6b) a direct child of <fileDesc>.

Personally, I'm leaning very strongly towards (2), (5), or (6a). That is, I think (1), (3), (4a), and (4b) are bad ideas, and prefer (6a) to (6b). That said, I'm open to thoughts, ideas, and suggestions.

James Cummings

8 Jul 8 Jul

10:15 a.m.

New subject: [tei-council] question on <xenoData>

I'd be interested in Syd explaining why he doesn't think 4a is a good idea? Generally I'm of the opinion that anything inside <xenoData> and its relationship to the electronic text is totally in the hands of the encoder. I think it will definitely be used for non-bibliographic metadata. -J On 04/07/15 16:40, Kevin Hawkins wrote:

...

All,

Since I created the ticket that prompted this ( https://sourceforge.net/p/tei/feature-requests/453/ ), I hope you'll forgive me for inserting myself into your discussion. I should first say that, in general, I think it's a good idea to summarize Council deliberations of this sort on the tickets themselves in order to get input from the person who submitted the ticket. You don't have to take the submitter's advice, but it may save you trouble in the future in case you end up implementing it in a way that doesn't actually meet their needs.

Anyway, back to the matter at hand.

The Guidelines have always been careful to distinguish between bibliographic information about the electronic document and about the source of that document, making it clear that of the children of <fileDesc>, only the contents of <sourceDesc> are meant to describe the source. As for other components of the header, it's implied that <encodingDesc> and <revisionDesc> describe the electronic document as well, whereas <profileDesc> is unfortunately ambiguous.

It seems plausible to me that someone would want to provide outside metadata describing either the electronic document or the source document, and the Guidelines should provide a way for them to describe either. While we could be ambiguous about which is being described (as with <profileDesc> and Syd's option 1), I think we can all agree that it's better to be explicit.

You could create @describes or use @type (options 2-4), and if so, I think you should add it to <profileDesc> as well. Still, this doesn't feel right to me. The various descendents of <fileDesc>, <sourceDesc>, <bibl>, <biblStruct>, and <biblFull> are used to describe, variously, the electronic document, the source document, or another bibliographic item mentioned in the document, and you only know which of these based on the ancestor element. To follow this practice, I think it would make sense to allow <xenoData> both as a child of <fileDesc> and of <sourceDesc> -- that is, option 5.

But then the question remains whether only allowing <xenoData> to be used to provide *bibliographic* metadata (what's found in <fileDesc> and again in its child <sourceDesc>) is really appropriate. While the Dublin Core, MODS, and Creative Commons examples at http://paramedic.wwp.neu.edu/~syd/temp/TEI_Council/P5-for-xenoData/Guideline... are all purely bibliographic, I'm not sure we can exclude the possibility that someone would want to include external non-bibliographic metadata -- that is, the sorts of data found in <encodingDesc>, <profileDesc>, and <revisionDesc>.

If we follow this train of thought, we'd want to allow <xenoData> to be included *anywhere* in the header so that it would be clear from the parent or ancestor element what the data it contains describes. However, this gets away from the rationale given in http://paramedic.wwp.neu.edu/~syd/temp/TEI_Council/P5-for-xenoData/Guideline... -- that it's "easier for humans to manage these elements if they are grouped together in a single location".

For that reason, I now reluctantly support options 2, 3, or 4a. (I don't support 4b because I think suggested values are always better than not suggesting any values.) I do like the idea of maintaining the TEI's very basic distinction between a description of the electronic document a a description of the source document, so I think it would be good to make this explicit and, as above, to implement a similar solution for <profileDesc>.

Furthermore, I recommend giving an example in the element spec where there's more than one <xenoData> element, each with a different @describes or @type, showing that you might include both metadata about the electronic document and about the source document but that you would be careful to put the metadata in separate <xenoData> elements.

Kevin

On 7/4/15 9:12 AM, Syd Bauman wrote:

...
First, <xenoData> should be a member of att.declarable. I think that's pretty much a no-brainer, but raise it here in case someone wants to object.

But my next issue is at least non-obvious, if not controversial (and I hope it's not). How is a process to know whether the metadata supplied in <xenoData> applies to the TEI document or to the source thereof? Several possibilities jump to mind.

1) It doesn't. This is not our problem. Any project using <xenoData> will know what metadata they are using, so they'll figure it out just fine.

2) We add a new attribute to <xenoData>, @describes. Its possible values are "source" and "transcription" or some such.

3) We use a special @type (with values "source" and "transcription" or some such) for this, even though it's not quite right.

4) We add <xenoData> to att.typed and let projects come up with their own values of @type to assert this information. Maybe with suggested values (4a), maybe not (4b).

5) Instead of making <xenoData> a member of model.teiHeaderPart, and thus a younger sibling of <fileDesc>, we allow <xenoData> to occur as a child of <fileDesc> or as a child of <sourceDesc> (via membership in model.sourceDescPart). In the former case the metadata in <xenoData> is understood to apply to the TEI document; in the latter case the metadata in <xenoData> is understood to apply to the source from which the TEI document was derived.

6) Like (5), but leave <xenoData> as a member of model.teiHeaderPart instead of (6a) or in addition to (6b) a direct child of <fileDesc>.

Personally, I'm leaning very strongly towards (2), (5), or (6a). That is, I think (1), (3), (4a), and (4b) are bad ideas, and prefer (6a) to (6b). That said, I'm open to thoughts, ideas, and suggestions.

-- Dr James Cummings, James.Cummings@it.ox.ac.uk Academic IT Services, University of Oxford

Martin Holmes

5:12 p.m.

New subject: [tei-council] question on <xenoData>

On 15-07-08 01:15 AM, James Cummings wrote:

...

I'd be interested in Syd explaining why he doesn't think 4a is a good idea? Generally I'm of the opinion that anything inside <xenoData> and its relationship to the electronic text is totally in the hands of the encoder. I think it will definitely be used for non-bibliographic metadata.

That's the key point. People will put all sorts of peripheral stuff in there (RDF ontologies, for instance) which are neither "about" the primary source document nor the electronic edition. I think if we're providing a place for people to put whatever they like, it doesn't really make sense to start constraining it. Cheers, Martin

...

-J

On 04/07/15 16:40, Kevin Hawkins wrote:

...
All,

Since I created the ticket that prompted this ( https://sourceforge.net/p/tei/feature-requests/453/ ), I hope you'll forgive me for inserting myself into your discussion. I should first say that, in general, I think it's a good idea to summarize Council deliberations of this sort on the tickets themselves in order to get input from the person who submitted the ticket. You don't have to take the submitter's advice, but it may save you trouble in the future in case you end up implementing it in a way that doesn't actually meet their needs.

Anyway, back to the matter at hand.

The Guidelines have always been careful to distinguish between bibliographic information about the electronic document and about the source of that document, making it clear that of the children of <fileDesc>, only the contents of <sourceDesc> are meant to describe the source. As for other components of the header, it's implied that <encodingDesc> and <revisionDesc> describe the electronic document as well, whereas <profileDesc> is unfortunately ambiguous.

It seems plausible to me that someone would want to provide outside metadata describing either the electronic document or the source document, and the Guidelines should provide a way for them to describe either. While we could be ambiguous about which is being described (as with <profileDesc> and Syd's option 1), I think we can all agree that it's better to be explicit.

You could create @describes or use @type (options 2-4), and if so, I think you should add it to <profileDesc> as well. Still, this doesn't feel right to me. The various descendents of <fileDesc>, <sourceDesc>, <bibl>, <biblStruct>, and <biblFull> are used to describe, variously, the electronic document, the source document, or another bibliographic item mentioned in the document, and you only know which of these based on the ancestor element. To follow this practice, I think it would make sense to allow <xenoData> both as a child of <fileDesc> and of <sourceDesc> -- that is, option 5.

But then the question remains whether only allowing <xenoData> to be used to provide *bibliographic* metadata (what's found in <fileDesc> and again in its child <sourceDesc>) is really appropriate. While the Dublin Core, MODS, and Creative Commons examples at http://paramedic.wwp.neu.edu/~syd/temp/TEI_Council/P5-for-xenoData/Guideline...

are all purely bibliographic, I'm not sure we can exclude the possibility that someone would want to include external non-bibliographic metadata -- that is, the sorts of data found in <encodingDesc>, <profileDesc>, and <revisionDesc>.

If we follow this train of thought, we'd want to allow <xenoData> to be included *anywhere* in the header so that it would be clear from the parent or ancestor element what the data it contains describes. However, this gets away from the rationale given in http://paramedic.wwp.neu.edu/~syd/temp/TEI_Council/P5-for-xenoData/Guideline...

-- that it's "easier for humans to manage these elements if they are grouped together in a single location".

For that reason, I now reluctantly support options 2, 3, or 4a. (I don't support 4b because I think suggested values are always better than not suggesting any values.) I do like the idea of maintaining the TEI's very basic distinction between a description of the electronic document a a description of the source document, so I think it would be good to make this explicit and, as above, to implement a similar solution for <profileDesc>.

Furthermore, I recommend giving an example in the element spec where there's more than one <xenoData> element, each with a different @describes or @type, showing that you might include both metadata about the electronic document and about the source document but that you would be careful to put the metadata in separate <xenoData> elements.

Kevin

On 7/4/15 9:12 AM, Syd Bauman wrote:

...
First, <xenoData> should be a member of att.declarable. I think that's pretty much a no-brainer, but raise it here in case someone wants to object.

But my next issue is at least non-obvious, if not controversial (and I hope it's not). How is a process to know whether the metadata supplied in <xenoData> applies to the TEI document or to the source thereof? Several possibilities jump to mind.

1) It doesn't. This is not our problem. Any project using <xenoData> will know what metadata they are using, so they'll figure it out just fine.

2) We add a new attribute to <xenoData>, @describes. Its possible values are "source" and "transcription" or some such.

3) We use a special @type (with values "source" and "transcription" or some such) for this, even though it's not quite right.

4) We add <xenoData> to att.typed and let projects come up with their own values of @type to assert this information. Maybe with suggested values (4a), maybe not (4b).

5) Instead of making <xenoData> a member of model.teiHeaderPart, and thus a younger sibling of <fileDesc>, we allow <xenoData> to occur as a child of <fileDesc> or as a child of <sourceDesc> (via membership in model.sourceDescPart). In the former case the metadata in <xenoData> is understood to apply to the TEI document; in the latter case the metadata in <xenoData> is understood to apply to the source from which the TEI document was derived.

6) Like (5), but leave <xenoData> as a member of model.teiHeaderPart instead of (6a) or in addition to (6b) a direct child of <fileDesc>.

Personally, I'm leaning very strongly towards (2), (5), or (6a). That is, I think (1), (3), (4a), and (4b) are bad ideas, and prefer (6a) to (6b). That said, I'm open to thoughts, ideas, and suggestions.

Lou Burnard

7:17 p.m.

New subject: [tei-council] question on <xenoData>

On 08/07/15 16:12, Martin Holmes wrote:

...

On 15-07-08 01:15 AM, James Cummings wrote:

...
I'd be interested in Syd explaining why he doesn't think 4a is a good idea? Generally I'm of the opinion that anything inside <xenoData> and its relationship to the electronic text is totally in the hands of the encoder. I think it will definitely be used for non-bibliographic metadata.

That's the key point. People will put all sorts of peripheral stuff in there (RDF ontologies, for instance) which are neither "about" the primary source document nor the electronic edition. I think if we're providing a place for people to put whatever they like, it doesn't really make sense to start constraining it.

I agree. Plus there is no guarantee that a particular tranche of xenodata is consistently "about" any one thing anyway

James Cummings

9 Jul 9 Jul

1:03 a.m.

New subject: [tei-council] question on <xenoData>

That said, if we can have more than one of them and envision that users might reasonably categorise them, then that would meet the test for membership of att.typed. James -- Dr James Cummings, Academic IT, University of Oxford -----Original Message----- From: Lou Burnard [lou.burnard@retired.ox.ac.uk] Received: Wednesday, 08 Jul 2015, 18:17 To: tei-council@lists.tei-c.org [tei-council@lists.tei-c.org] Subject: Re: [tei-council] question on <xenoData> On 08/07/15 16:12, Martin Holmes wrote:

...

On 15-07-08 01:15 AM, James Cummings wrote:

...
I'd be interested in Syd explaining why he doesn't think 4a is a good idea? Generally I'm of the opinion that anything inside <xenoData> and its relationship to the electronic text is totally in the hands of the encoder. I think it will definitely be used for non-bibliographic metadata.

That's the key point. People will put all sorts of peripheral stuff in there (RDF ontologies, for instance) which are neither "about" the primary source document nor the electronic edition. I think if we're providing a place for people to put whatever they like, it doesn't really make sense to start constraining it.

I agree. Plus there is no guarantee that a particular tranche of xenodata is consistently "about" any one thing anyway -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council PLEASE NOTE: postings to this list are publicly archived

Martin Holmes

1:12 a.m.

New subject: [tei-council] question on <xenoData>

On 15-07-08 04:03 PM, James Cummings wrote:

...

That said, if we can have more than one of them and envision that users might reasonably categorise them, then that would meet the test for membership of att.typed.

Yes, I was actually arguing for membership of att.typed, on the basis that it's up to the user to decide what sort of metadata goes in these things. The distinction between metadata about the source document and metadata about the electronic document is very fuzzy a lot of the time. One of the things I'm likely to store in <xenoData> is a block of OAI:PMH metadata which includes the date and title of the original document (source) as well as a list of individuals and places referred to in it (source? electronic?) along with information on the date the file was last changed its canonical URL (electronic). It's really closer to <teiHeader> than to e.g. <sourceDesc>. Cheers, Martin

...

James

-- Dr James Cummings, Academic IT, University of Oxford

-----Original Message----- From: Lou Burnard [lou.burnard@retired.ox.ac.uk] Received: Wednesday, 08 Jul 2015, 18:17 To: tei-council@lists.tei-c.org [tei-council@lists.tei-c.org] Subject: Re: [tei-council] question on <xenoData>

On 08/07/15 16:12, Martin Holmes wrote:

...
On 15-07-08 01:15 AM, James Cummings wrote:

...
I'd be interested in Syd explaining why he doesn't think 4a is a good idea? Generally I'm of the opinion that anything inside <xenoData> and its relationship to the electronic text is totally in the hands of the encoder. I think it will definitely be used for non-bibliographic metadata.

That's the key point. People will put all sorts of peripheral stuff in there (RDF ontologies, for instance) which are neither "about" the primary source document nor the electronic edition. I think if we're providing a place for people to put whatever they like, it doesn't really make sense to start constraining it.

I agree. Plus there is no guarantee that a particular tranche of xenodata is consistently "about" any one thing anyway

-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

Syd Bauman

4:03 a.m.

New subject: [tei-council] question on <xenoData>

I'm convinced. I'll put it into att.typed and give it a private copy of @type with a suggested values include list of "source" and "transcription" (anyone come up with a better term?) and let Martin or anyone else add whatever others seem appropriate. People can then poke at that. Thanks, guys.

Syd Bauman

4:17 a.m.

New subject: [tei-council] question on <xenoData>

But just to be explicit about this, we are (deliberately) making the same mistake TEI made with @type of <name> and friends: the value of @type here does not describe the type of the element itself or the stuff inside the element, but rather categorizes the type of stuff that the elements inside refer to or describe. (That's why I liked @descirbes better, but if <xenoData> is going to be used for so much more than describing the source and describing the TEI file, that may not be a good idea.)

...

I'm convinced. I'll put it into att.typed and give it a private copy of @type with a suggested values include list of "source" and "transcription" (anyone come up with a better term?) and let Martin or anyone else add whatever others seem appropriate.

People can then poke at that.

Kevin Hawkins

4:22 a.m.

New subject: [tei-council] question on <xenoData>

I was just about to write to the list with the same concern about using @type. As for the value of the attribute (whatever name you choose) ... While "source" is well established in the Guidelines for the thing described by <sourceDesc>, "transcription" is not generally used for the thing described by the other children of <fileDesc>. After all, the Guidelines claim to be equally applicable not only to a manual transcription of a written or audio document but also to an electronic text created through automated means, even from another electronic source. So "electronic text" is often used, as are "computer file", "electronic file", and "electronic work". So maybe the suggested value of the attribute could be one of the following: electronicText computerFile electronicFile electronicWork --Kevin On 7/8/15 9:17 PM, Syd Bauman wrote:

...

But just to be explicit about this, we are (deliberately) making the same mistake TEI made with @type of <name> and friends: the value of @type here does not describe the type of the element itself or the stuff inside the element, but rather categorizes the type of stuff that the elements inside refer to or describe. (That's why I liked @descirbes better, but if <xenoData> is going to be used for so much more than describing the source and describing the TEI file, that may not be a good idea.)

...
I'm convinced. I'll put it into att.typed and give it a private copy of @type with a suggested values include list of "source" and "transcription" (anyone come up with a better term?) and let Martin or anyone else add whatever others seem appropriate.

People can then poke at that.

Fabio Ciotti

10:39 a.m.

New subject: [tei-council] question on <xenoData>

Hi all sorry if jump into this late since I was a supporter of xenoData, I was having some days of relax with my family. I still am favorable with the idea of having xeno as a repeatable child of teiHeader, and with the idea of adding it to att.type class. Instead I do not really like the idea of having a list of suggested value, since the possibilities are quite a lot (for instance one could use xeno to add PREMIS metatada that refine revisionDesc, or texhincal metadata for images of manuscript), and many other thing we can not think of now. some thoughts: 1) probably it would be good to have a special attribute to state the type of metadata vocabulary used, in the syle of @MDTYPE of METS, and this could be typed adopting f.e. mets Endorsed External Schemas list. In this way we can avoid having any sort of thing insid xeno. 2) we could suggest the usage of @corresp or @sameAs tu say the fqact thata a xeno element compkly one of the standard TEI metadata elements 3) a good thing is to add to xeno is an attribute to assert that a set of xeno metadata apply to a specific part of the TEI file: for instance imagine *I* want to give MODS description of bibliographic items cited in the text, or give MIX technical metadata of facsimiles images... The types beasts here can really be many. I have given a look at actual atts but I have not seen any already existing candidate (except for a misused @ref), but some of you can have a wider sight than mine. This could also suggest to have a revers attribute to point at metadata at least in model.graphicLike members (but this is another ticket). 4) someone would like to point at xeno residing in an external file. This could be solved a) again with an abuse of @ref b) with a new element, say @extMDRef c) making the content model of xen more complex to allow an alternation of <ref> and of internal xeno data. Probably my thoughts exceed the scope of the initial xeno implementation but I wanted to give an initial shape to some ideas in my head! f 2015-07-09 4:22 GMT+02:00 Kevin Hawkins <kevin.s.hawkins@ultraslavonic.info> :

...

I was just about to write to the list with the same concern about using @type.

As for the value of the attribute (whatever name you choose) ... While "source" is well established in the Guidelines for the thing described by <sourceDesc>, "transcription" is not generally used for the thing described by the other children of <fileDesc>. After all, the Guidelines claim to be equally applicable not only to a manual transcription of a written or audio document but also to an electronic text created through automated means, even from another electronic source. So "electronic text" is often used, as are "computer file", "electronic file", and "electronic work". So maybe the suggested value of the attribute could be one of the following:

electronicText computerFile electronicFile electronicWork

--Kevin

On 7/8/15 9:17 PM, Syd Bauman wrote:

...
But just to be explicit about this, we are (deliberately) making the same mistake TEI made with @type of <name> and friends: the value of @type here does not describe the type of the element itself or the stuff inside the element, but rather categorizes the type of stuff that the elements inside refer to or describe. (That's why I liked @descirbes better, but if <xenoData> is going to be used for so much more than describing the source and describing the TEI file, that may not be a good idea.)

I'm convinced. I'll put it into att.typed and give it a private

...
copy of @type with a suggested values include list of "source" and "transcription" (anyone come up with a better term?) and let Martin or anyone else add whatever others seem appropriate.

People can then poke at that.

-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

-- Fabio Ciotti Dipartimento Studi Umanistici, Università di Roma Tor Vergata Presidente Associazione Informatica Umanistica Cultura Digitale

Lou Burnard

3:26 p.m.

New subject: [tei-council] question on <xenoData>

Please please let's not use @type for this ! How about @scheme (for the format of the metadata) and @scope (for its err scope, i.e. what it's about) if you insist on saying anything about either? I also think that any typology of closed values we might propose will just look silly. On 09/07/15 03:22, Kevin Hawkins wrote:

...

I was just about to write to the list with the same concern about using @type.

As for the value of the attribute (whatever name you choose) ... While "source" is well established in the Guidelines for the thing described by <sourceDesc>, "transcription" is not generally used for the thing described by the other children of <fileDesc>. After all, the Guidelines claim to be equally applicable not only to a manual transcription of a written or audio document but also to an electronic text created through automated means, even from another electronic source. So "electronic text" is often used, as are "computer file", "electronic file", and "electronic work". So maybe the suggested value of the attribute could be one of the following:

electronicText computerFile electronicFile electronicWork

--Kevin

On 7/8/15 9:17 PM, Syd Bauman wrote:

...
But just to be explicit about this, we are (deliberately) making the same mistake TEI made with @type of <name> and friends: the value of @type here does not describe the type of the element itself or the stuff inside the element, but rather categorizes the type of stuff that the elements inside refer to or describe. (That's why I liked @descirbes better, but if <xenoData> is going to be used for so much more than describing the source and describing the TEI file, that may not be a good idea.)

...
I'm convinced. I'll put it into att.typed and give it a private copy of @type with a suggested values include list of "source" and "transcription" (anyone come up with a better term?) and let Martin or anyone else add whatever others seem appropriate.

People can then poke at that.

James Cummings

5:29 p.m.

New subject: [tei-council] question on <xenoData>

I could be convinced of those two attributes. But yes, I wouldn't provide suggested values at all. -James On 09/07/15 14:26, Lou Burnard wrote:

...

Please please let's not use @type for this !

How about @scheme (for the format of the metadata) and @scope (for its err scope, i.e. what it's about) if you insist on saying anything about either?

I also think that any typology of closed values we might propose will just look silly.

On 09/07/15 03:22, Kevin Hawkins wrote:

...
I was just about to write to the list with the same concern about using @type.

As for the value of the attribute (whatever name you choose) ... While "source" is well established in the Guidelines for the thing described by <sourceDesc>, "transcription" is not generally used for the thing described by the other children of <fileDesc>. After all, the Guidelines claim to be equally applicable not only to a manual transcription of a written or audio document but also to an electronic text created through automated means, even from another electronic source. So "electronic text" is often used, as are "computer file", "electronic file", and "electronic work". So maybe the suggested value of the attribute could be one of the following:

electronicText computerFile electronicFile electronicWork

--Kevin

On 7/8/15 9:17 PM, Syd Bauman wrote:

...
But just to be explicit about this, we are (deliberately) making the same mistake TEI made with @type of <name> and friends: the value of @type here does not describe the type of the element itself or the stuff inside the element, but rather categorizes the type of stuff that the elements inside refer to or describe. (That's why I liked @descirbes better, but if <xenoData> is going to be used for so much more than describing the source and describing the TEI file, that may not be a good idea.)

...
I'm convinced. I'll put it into att.typed and give it a private copy of @type with a suggested values include list of "source" and "transcription" (anyone come up with a better term?) and let Martin or anyone else add whatever others seem appropriate.

People can then poke at that.

-- Dr James Cummings, James.Cummings@it.ox.ac.uk Academic IT Services, University of Oxford

Fabio Ciotti

5:40 p.m.

New subject: [tei-council] question on <xenoData>

I like Lou's proposal for att names f 2015-07-09 17:29 GMT+02:00 James Cummings <James.Cummings@it.ox.ac.uk>:

...

I could be convinced of those two attributes.

But yes, I wouldn't provide suggested values at all.

-James

On 09/07/15 14:26, Lou Burnard wrote:

...
Please please let's not use @type for this !

How about @scheme (for the format of the metadata) and @scope (for its err scope, i.e. what it's about) if you insist on saying anything about either?

I also think that any typology of closed values we might propose will just look silly.

On 09/07/15 03:22, Kevin Hawkins wrote:

...
I was just about to write to the list with the same concern about using @type.

As for the value of the attribute (whatever name you choose) ... While "source" is well established in the Guidelines for the thing described by <sourceDesc>, "transcription" is not generally used for the thing described by the other children of <fileDesc>. After all, the Guidelines claim to be equally applicable not only to a manual transcription of a written or audio document but also to an electronic text created through automated means, even from another electronic source. So "electronic text" is often used, as are "computer file", "electronic file", and "electronic work". So maybe the suggested value of the attribute could be one of the following:

electronicText computerFile electronicFile electronicWork

--Kevin

On 7/8/15 9:17 PM, Syd Bauman wrote:

...
But just to be explicit about this, we are (deliberately) making the same mistake TEI made with @type of <name> and friends: the value of @type here does not describe the type of the element itself or the stuff inside the element, but rather categorizes the type of stuff that the elements inside refer to or describe. (That's why I liked @descirbes better, but if <xenoData> is going to be used for so much more than describing the source and describing the TEI file, that may not be a good idea.)

I'm convinced. I'll put it into att.typed and give it a private

...
copy of @type with a suggested values include list of "source" and "transcription" (anyone come up with a better term?) and let Martin or anyone else add whatever others seem appropriate.

People can then poke at that.

-- Dr James Cummings, James.Cummings@it.ox.ac.uk Academic IT Services, University of Oxford -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

-- Fabio Ciotti Dipartimento Studi Umanistici, Università di Roma Tor Vergata Presidente Associazione Informatica Umanistica Cultura Digitale

Martin Holmes

6:04 p.m.

New subject: [tei-council] question on <xenoData>

We already have four completely different @scope attributes; please let's not add a fifth. How about adding <xenoData> to att.pointing? Then it can be linked very precisely to the location(s) in the rest of the file to which the metadata applies. This could be done with e.g. @corresp, but we know that causes discomfort in some folks. Cheers, Martin On 15-07-09 08:29 AM, James Cummings wrote:

...

I could be convinced of those two attributes.

But yes, I wouldn't provide suggested values at all.

-James

On 09/07/15 14:26, Lou Burnard wrote:

...
Please please let's not use @type for this !

How about @scheme (for the format of the metadata) and @scope (for its err scope, i.e. what it's about) if you insist on saying anything about either?

I also think that any typology of closed values we might propose will just look silly.

On 09/07/15 03:22, Kevin Hawkins wrote:

...
I was just about to write to the list with the same concern about using @type.

As for the value of the attribute (whatever name you choose) ... While "source" is well established in the Guidelines for the thing described by <sourceDesc>, "transcription" is not generally used for the thing described by the other children of <fileDesc>. After all, the Guidelines claim to be equally applicable not only to a manual transcription of a written or audio document but also to an electronic text created through automated means, even from another electronic source. So "electronic text" is often used, as are "computer file", "electronic file", and "electronic work". So maybe the suggested value of the attribute could be one of the following:

electronicText computerFile electronicFile electronicWork

--Kevin

On 7/8/15 9:17 PM, Syd Bauman wrote:

...
But just to be explicit about this, we are (deliberately) making the same mistake TEI made with @type of <name> and friends: the value of @type here does not describe the type of the element itself or the stuff inside the element, but rather categorizes the type of stuff that the elements inside refer to or describe. (That's why I liked @descirbes better, but if <xenoData> is going to be used for so much more than describing the source and describing the TEI file, that may not be a good idea.)

...
I'm convinced. I'll put it into att.typed and give it a private copy of @type with a suggested values include list of "source" and "transcription" (anyone come up with a better term?) and let Martin or anyone else add whatever others seem appropriate.

People can then poke at that.

Lou Burnard

10:14 p.m.

New subject: [tei-council] question on <xenoData>

On 09/07/15 17:04, Martin Holmes wrote:

...

We already have four completely different @scope attributes; please let's not add a fifth

Well, I am not so sure these four are all semantically "completely different" -- they all say something about the extent to which the annotation concerned is universally (or not) applicable. Which is more or less what we want here, surley? But by all means let's have anothjer suggestion.

...

How about adding <xenoData> to att.pointing? Then it can be linked very precisely to the location(s) in the rest of the file to which the metadata applies. This could be done with e.g. @corresp, but we know that causes discomfort in some folks.

Aaaargh. Whence this spurious desire for "precise linking"? This is xenodata. It could be anything. We cannot say anything about what it applies to and shouldnt pretend to be able to.

...

Cheers, Martin

Boos, Lou

...

On 15-07-09 08:29 AM, James Cummings wrote:

...
I could be convinced of those two attributes.

But yes, I wouldn't provide suggested values at all.

-James

On 09/07/15 14:26, Lou Burnard wrote:

...
Please please let's not use @type for this !

How about @scheme (for the format of the metadata) and @scope (for its err scope, i.e. what it's about) if you insist on saying anything about either?

I also think that any typology of closed values we might propose will just look silly.

On 09/07/15 03:22, Kevin Hawkins wrote:

...
I was just about to write to the list with the same concern about using @type.

As for the value of the attribute (whatever name you choose) ... While "source" is well established in the Guidelines for the thing described by <sourceDesc>, "transcription" is not generally used for the thing described by the other children of <fileDesc>. After all, the Guidelines claim to be equally applicable not only to a manual transcription of a written or audio document but also to an electronic text created through automated means, even from another electronic source. So "electronic text" is often used, as are "computer file", "electronic file", and "electronic work". So maybe the suggested value of the attribute could be one of the following:

electronicText computerFile electronicFile electronicWork

--Kevin

On 7/8/15 9:17 PM, Syd Bauman wrote:

...
But just to be explicit about this, we are (deliberately) making the same mistake TEI made with @type of <name> and friends: the value of @type here does not describe the type of the element itself or the stuff inside the element, but rather categorizes the type of stuff that the elements inside refer to or describe. (That's why I liked @descirbes better, but if <xenoData> is going to be used for so much more than describing the source and describing the TEI file, that may not be a good idea.)

...
I'm convinced. I'll put it into att.typed and give it a private copy of @type with a suggested values include list of "source" and "transcription" (anyone come up with a better term?) and let Martin or anyone else add whatever others seem appropriate.

People can then poke at that.

Syd Bauman

10 Jul 10 Jul

2:47 a.m.

New subject: [tei-council] question on <xenoData>

KH> As for the value of the attribute (whatever name you choose) ... KH> While "source" is well established in the Guidelines for the KH> thing described by <sourceDesc>, "transcription" is not generally KH> used ... "electronic text" is often used, as are "computer file", KH> "electronic file", and "electronic work". So maybe the suggested KH> value of the attribute could be one of the following: KH> * electronicText KH> * computerFile KH> * electronicFile KH> * electronicWork I agree that "transcription" is not ideal, but my complaint with the suggestions above is that they do not differentiate between the source document (which might well be an electronic text, a computer file, or some other electronic work) and the current TEI file. I kinda want "thisTEI", but that's dorky. FC> I still am favorable with adding [xenoData] to att.type class. FC> Instead I do not really like the idea of having a list of FC> suggested value, since the possibilities are quite a lot (for FC> instance one could use xeno to add PREMIS metatada that refine FC> revisionDesc, or texhincal metadata for images of manuscript), FC> and many other thing we can not think of now. I don't think any one is suggesting that there be a "closed" value list; I was suggesting (and I think there is some support for) a "semi" value list of "suggested values include". There really is no point in your project using "Source", mine using "src", and Lou using "original" to mean the same thing. (And, while I don't speak PREMIS well, it would certainly be "thisTEI" or whatever if it refined <revisionDesc>, no?) FC> 1) probably it would be good to have a special attribute to state the FC> type of metadata vocabulary used ... .. LB> How about @scheme (for the format of the metadata) I see no need for such an attribute, as I don't see what advantage it brings. You (the programmer who has just been handed a TEI document) find out what xeno-data schemes were used in <xenoData> by issuing distinct-values( /TEI/teiHeader/xenoData//*/namespace-uri(.) ) or some such. It's mildly more complicated, but much more reliable, than /TEI/teiHeader/xenoData/@scheme Having @scheme just allows a mis-match. FC> 2) we could suggest the usage of @corresp or @sameAs tu say the FC> fqact thata a xeno element compkly one of the standard TEI FC> metadata elements I don't like this idea. I don't think we need a method to say "this metadata duplicates the information found in <whatEver>". If we decide we want to, @sameAs is not tenable, it has the wrong semantics. @corresp isn't crazy, but I shy away from it anyway. Not sure I can explain why at the moment. (Which means I could probably be talked into it. :-) FC> 3) a good thing is to add to xeno is an attribute to assert that a FC> set of xeno metadata apply to a specific part of the TEI file ... .. MH> How about adding <xenoData> to att.pointing? Then it can be MH> linked very precisely to the location(s) in the rest of the file MH> to which the metadata applies The TEI already has a mechanism for indicating to which portions of a TEI document certain bits of metadata apply. It makes use of the @decls attribute in the document, which points to the applicable bit of metadata. See 15.3.2. (And note that <xenoData> is already a member of att.declarable.) FC> 4) someone would like to point at xeno residing in an external FC> file. Not sure exactly what you mean here, Fabio. If someone wanted to point to external metadata, they would have no reason to wrap said metadata into a <tei:xenoData> element. So I don't think that's what you have in mind. I think you have the case where I want to explicitly say "my MODS record for this file is over there". Not a bad idea, really. I'm inclined to say either A. we should not worry ourselves with this (until it becomes a feature request) B. allow <tei:ptr> as a child of <xenoData> as you suggest C. add @target to <xenoData> If we go for C, I'm inclined to have @target be used instead of content (i.e., a given <xenoData> may have either @target or child::*, not both), but as long as we define what it means if both are present, allowing both is OK. (Possibilities include: * use @target, ignore children * use children, ignore @target * use both * look for @target, use if found; if not found, use children at least, maybe others. :-) JC> would meet the test for membership of att.typed. .. MH> I was actually arguing for membership of att.typed .. SB> we are (deliberately) making the same mistake TEI made with @type SB> of <name> and friends .. KH> same concern about using @type. .. FC> I still am favorable with adding [xenoData] to att.type class. .. LB> Please please let's not use @type for this ! I'm with Kevin and Lou that @type is not a good way to differentiate "the metadata herein describes the source" from "the metadata herein describes the TEI file". But that doesn't mean <xenoData> shouldn't be in att.typed, anyway, as it is repeatable and categorizable. Thoughts? LB> How about ... @scope (for its err scope, i.e. what it's about) .. MH> We already have four completely different @scope attributes; MH> please let's not add a fifth. .. LB> Well, I am not so sure these four are all semantically LB> "completely different" -- they all say something about the extent LB> to which the annotation concerned is universally (or not) LB> applicable. Which is more or less what we want here, surley? But LB> by all means let's have anothjer suggestion. I'm with Lou, here. If we can come up with something better, sure -- but @scope is pretty close to spot on. Do people dislike my original suggestion "@describes"? LB> I also think that any typology of closed values we might propose will LB> just look silly. While perhaps true for @type (and I'm not convinced), I disagree completely with respect to @scope or @describes or whatever we call it. As I said above, we definitely want a "suggested values include" list here, especially for the two most common cases, "source" and "thisTEI" (or whatever).

Martin Holmes

6:14 a.m.

New subject: [tei-council] question on <xenoData>

...

MH> How about adding <xenoData> to att.pointing? Then it can be MH> linked very precisely to the location(s) in the rest of the file MH> to which the metadata applies

The TEI already has a mechanism for indicating to which portions of a TEI document certain bits of metadata apply. It makes use of the @decls attribute in the document, which points to the applicable bit of metadata. See 15.3.2. (And note that <xenoData> is already a member of att.declarable.)

@decls is available only a small subset of elements, and not on any of the header elements. This doesn't allow you to say that a specific <xenoData> applies to <sourceDesc> and another to <profileDesc>, does it? Cheers, Martin On 15-07-09 05:47 PM, Syd Bauman wrote:

...

KH> As for the value of the attribute (whatever name you choose) ... KH> While "source" is well established in the Guidelines for the KH> thing described by <sourceDesc>, "transcription" is not generally KH> used ... "electronic text" is often used, as are "computer file", KH> "electronic file", and "electronic work". So maybe the suggested KH> value of the attribute could be one of the following: KH> * electronicText KH> * computerFile KH> * electronicFile KH> * electronicWork

I agree that "transcription" is not ideal, but my complaint with the suggestions above is that they do not differentiate between the source document (which might well be an electronic text, a computer file, or some other electronic work) and the current TEI file. I kinda want "thisTEI", but that's dorky.

FC> I still am favorable with adding [xenoData] to att.type class. FC> Instead I do not really like the idea of having a list of FC> suggested value, since the possibilities are quite a lot (for FC> instance one could use xeno to add PREMIS metatada that refine FC> revisionDesc, or texhincal metadata for images of manuscript), FC> and many other thing we can not think of now.

I don't think any one is suggesting that there be a "closed" value list; I was suggesting (and I think there is some support for) a "semi" value list of "suggested values include". There really is no point in your project using "Source", mine using "src", and Lou using "original" to mean the same thing. (And, while I don't speak PREMIS well, it would certainly be "thisTEI" or whatever if it refined <revisionDesc>, no?)

FC> 1) probably it would be good to have a special attribute to state the FC> type of metadata vocabulary used ... .. LB> How about @scheme (for the format of the metadata)

I see no need for such an attribute, as I don't see what advantage it brings. You (the programmer who has just been handed a TEI document) find out what xeno-data schemes were used in <xenoData> by issuing distinct-values( /TEI/teiHeader/xenoData//*/namespace-uri(.) ) or some such. It's mildly more complicated, but much more reliable, than /TEI/teiHeader/xenoData/@scheme Having @scheme just allows a mis-match.

FC> 2) we could suggest the usage of @corresp or @sameAs tu say the FC> fqact thata a xeno element compkly one of the standard TEI FC> metadata elements

I don't like this idea. I don't think we need a method to say "this metadata duplicates the information found in <whatEver>". If we decide we want to, @sameAs is not tenable, it has the wrong semantics. @corresp isn't crazy, but I shy away from it anyway. Not sure I can explain why at the moment. (Which means I could probably be talked into it. :-)

FC> 3) a good thing is to add to xeno is an attribute to assert that a FC> set of xeno metadata apply to a specific part of the TEI file ... .. MH> How about adding <xenoData> to att.pointing? Then it can be MH> linked very precisely to the location(s) in the rest of the file MH> to which the metadata applies

The TEI already has a mechanism for indicating to which portions of a TEI document certain bits of metadata apply. It makes use of the @decls attribute in the document, which points to the applicable bit of metadata. See 15.3.2. (And note that <xenoData> is already a member of att.declarable.)

FC> 4) someone would like to point at xeno residing in an external FC> file.

Not sure exactly what you mean here, Fabio. If someone wanted to point to external metadata, they would have no reason to wrap said metadata into a <tei:xenoData> element. So I don't think that's what you have in mind. I think you have the case where I want to explicitly say "my MODS record for this file is over there". Not a bad idea, really. I'm inclined to say either A. we should not worry ourselves with this (until it becomes a feature request) B. allow <tei:ptr> as a child of <xenoData> as you suggest C. add @target to <xenoData>

If we go for C, I'm inclined to have @target be used instead of content (i.e., a given <xenoData> may have either @target or child::*, not both), but as long as we define what it means if both are present, allowing both is OK. (Possibilities include: * use @target, ignore children * use children, ignore @target * use both * look for @target, use if found; if not found, use children at least, maybe others. :-)

JC> would meet the test for membership of att.typed. .. MH> I was actually arguing for membership of att.typed .. SB> we are (deliberately) making the same mistake TEI made with @type SB> of <name> and friends .. KH> same concern about using @type. .. FC> I still am favorable with adding [xenoData] to att.type class. .. LB> Please please let's not use @type for this !

I'm with Kevin and Lou that @type is not a good way to differentiate "the metadata herein describes the source" from "the metadata herein describes the TEI file". But that doesn't mean <xenoData> shouldn't be in att.typed, anyway, as it is repeatable and categorizable. Thoughts?

LB> How about ... @scope (for its err scope, i.e. what it's about) .. MH> We already have four completely different @scope attributes; MH> please let's not add a fifth. .. LB> Well, I am not so sure these four are all semantically LB> "completely different" -- they all say something about the extent LB> to which the annotation concerned is universally (or not) LB> applicable. Which is more or less what we want here, surley? But LB> by all means let's have anothjer suggestion.

I'm with Lou, here. If we can come up with something better, sure -- but @scope is pretty close to spot on. Do people dislike my original suggestion "@describes"?

LB> I also think that any typology of closed values we might propose will LB> just look silly.

While perhaps true for @type (and I'm not convinced), I disagree completely with respect to @scope or @describes or whatever we call it. As I said above, we definitely want a "suggested values include" list here, especially for the two most common cases, "source" and "thisTEI" (or whatever).

Syd Bauman

2:42 p.m.

New subject: [tei-council] question on <xenoData>

Martin -- I worry that we may be talking at cross-purposes, here. <xenoData> is for supplying metadata, just like the other <teiHeader> components are for supplying metadata. Said metadata might be *about* the original source document (here is the MODS record for the book that was transcribed), and it might be *about* the TEI document (here is the Dublin Core metadata for my TEI file that we can use on ingestion into XTF). In the former case, it is not about the <sourceDesc>, it's about the source. In the latter case it is about the <TEI>, but so is everything else in the <teiHeader>, that's the point of having a <teiHeader>. (Some people are arguing it may also be *about* something else, e.g. the page images associated with this TEI document.) So a precise pointing mechanism does not solve the problem "what is this metadata about". That said, just like most any other metadata in the <teiHeader>, the metadata may be applicable to only certain portions of my collection. E.g., I may have obtained my copy of _The Fellowship of the Ring_ from one library, and my copies of _The Two Towers_ and _The Return of the King_ from another. The @decls mechanism already exists to handle this. We don't need another mechanism to solve the problem of "to what portions of my TEI document is this metadata applicable". (That said, you may want to suggest we should improve the @decls mechanism -- I can already hear you saying it should be global -- but that's a different ticket. :-)

...

@decls is available only a small subset of elements, and not on any of the header elements. This doesn't allow you to say that a specific <xenoData> applies to <sourceDesc> and another to <profileDesc>, does it?

Martin Holmes

5:14 p.m.

New subject: [tei-council] question on <xenoData>

Hi Syd, I don't think we're talking at cross-purposes. Let's take a concrete example. Say I have a bibliography encoded as a <listBibl> in the <back> of my document. I have a <xenoData> element in my header that contains a <mods> block which represents one of the <bibl>s in my bibliography. How do I specify that this block of <xenoData> refers to that <bibl>? @decls is no help here, surely. I could do it with @corresp (on <bibl> or on <xenoData> or even on both). But Kevin has expressed some reservations about that which I suspect from previous discussions that Lou would share. (I'd be OK with it, myself.) @sameAs would seem to be a candidate, except that (as almost always with @sameAs) when you look at the definition, it turns out that this relationship is not as same-as as necessary for a proper use of that attribute. So I suggested another linking option. I don't really mind how it's done, but I do think it's important that people be able to link between their <xenoData> blocks and other elements -- any elements -- in their files. In some cases such as FOAF, the standard itself provides a way to link out from the element to its target: <foaf:Person rdf:about="#danbri" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:name>Dan Brickley</foaf:name> <foaf:homepage rdf:resource="http://danbri.org/" /> <foaf:openid rdf:resource="http://danbri.org/" /> <foaf:img rdf:resource="/images/me.jpg" /> </foaf:Person> <http://xmlns.com/foaf/spec/> but other XML vocabularies may not provide this; furthermore, parsing this requires that you understand the vocabulary in the <xenoData>. Also it may well be the case (presumably) that non-XML vocabularies may be used in <xenoData>. So I think there's a strong case for a method of linking to be available at the TEI level. Cheers, Martin On 15-07-10 05:42 AM, Syd Bauman wrote:

...

Martin --

I worry that we may be talking at cross-purposes, here. <xenoData> is for supplying metadata, just like the other <teiHeader> components are for supplying metadata. Said metadata might be *about* the original source document (here is the MODS record for the book that was transcribed), and it might be *about* the TEI document (here is the Dublin Core metadata for my TEI file that we can use on ingestion into XTF). In the former case, it is not about the <sourceDesc>, it's about the source. In the latter case it is about the <TEI>, but so is everything else in the <teiHeader>, that's the point of having a <teiHeader>. (Some people are arguing it may also be *about* something else, e.g. the page images associated with this TEI document.) So a precise pointing mechanism does not solve the problem "what is this metadata about".

That said, just like most any other metadata in the <teiHeader>, the metadata may be applicable to only certain portions of my collection. E.g., I may have obtained my copy of _The Fellowship of the Ring_ from one library, and my copies of _The Two Towers_ and _The Return of the King_ from another. The @decls mechanism already exists to handle this. We don't need another mechanism to solve the problem of "to what portions of my TEI document is this metadata applicable". (That said, you may want to suggest we should improve the @decls mechanism -- I can already hear you saying it should be global -- but that's a different ticket. :-)

...
@decls is available only a small subset of elements, and not on any of the header elements. This doesn't allow you to say that a specific <xenoData> applies to <sourceDesc> and another to <profileDesc>, does it?

Kevin Hawkins

6:52 p.m.

New subject: [tei-council] question on <xenoData>

On 7/10/15 10:14 AM, Martin Holmes wrote:

...

I could do it with @corresp (on <bibl> or on <xenoData> or even on both). But Kevin has expressed some reservations about that which I suspect from previous discussions that Lou would share. (I'd be OK with it, myself.)

I don't think that was me. The only reservation I recall was about using @type on <xenoData> to describe what the metadata describes instead of what type of metadata it is. --Kevin

Martin Holmes

7:25 p.m.

New subject: [tei-council] question on <xenoData>

On 15-07-10 09:52 AM, Kevin Hawkins wrote:

...

On 7/10/15 10:14 AM, Martin Holmes wrote:

...
I could do it with @corresp (on <bibl> or on <xenoData> or even on both). But Kevin has expressed some reservations about that which I suspect from previous discussions that Lou would share. (I'd be OK with it, myself.)

I don't think that was me. The only reservation I recall was about using @type on <xenoData> to describe what the metadata describes instead of what type of metadata it is.

Sorry Kevin, you're right; it was Syd:

...

@corresp isn't crazy, but I shy away from it anyway. Not sure I can explain why at the moment. (Which means I could probably be talked into it.

Cheers, Martin

Syd Bauman

11 Jul 11 Jul

8:03 p.m.

New subject: [tei-council] question on <xenoData>

Hi Martin -- You have a MODS record for one item in your transcribed bibliography, and want to put that into your TEI file? Power to you, I suppose. But that doesn't seem much like the feature OP (Kevin) requested, or the "problem" for which librarians have been asking for a solution for almost a decade. <xenoData>, IMHO, is intended to make the easy case easy, not to make the complex case possible (in large part because the complex case is already possible -- e.g., you can put <mods:*> as a child of <bibl>, or put it in your <teiHeader> attach it to the <bibl> with <link>). (BTW, I prefer <link> to @corresp for this because I get to express which block of metadata is derived from which, which I can't express explicitly with @corresp.) The easy case, the one which quite a few large projects have expressed a desire for, is to have a consistent place to put the METS record from which the <teiHeader> was derived, or the Dublin Core metadata that you derived from the <teiHeader> for whatever purpose, but don't want to re-generate on the fly. So I'm inclined not to add another mechanism for attachment to <xenoData>. That said, I still want us all to agree on an easy explicit mechanism to say "about the source" vs "about this TEI file". Speaking of which, Martin, I'm figuring the Dublin Core OAI example from the _The Colonial Despatches of Vancouver Island_ you posted on the ticket is about the TEI file, not the source, yes? MH> Also it may well be the case (presumably) that non-XML MH> vocabularies may be used in <xenoData>. I, for one, had never envisioned non-XML data inside <xenoData>, and think it would be a bad idea to permit it. (For the same kinds of reasons I'm unhappy with <binaryObject>.) That said, it is not at all clear to me whether the content of <xenoData> should be well-formed XML or if well-balanced XML is sufficient. (The difference being well-formed means that <xenoData> can only have 1 child; well-balanced would permit multiple children.) As I originally wrote it, because the content model of <xenoData> was just the already existing 'macro.anyXML', well-formedness was required. I have recently changed it to 'macro.anyXML+'[1], thus allowing well-balanced fragments. Notes ----- [1] Not entirely true, I changed it to 'macro.anyXML*', about which I plan to post shortly.

...

I don't think we're talking at cross-purposes. Let's take a concrete example. Say I have a bibliography encoded as a <listBibl> in the <back> of my document. I have a <xenoData> element in my header that contains a <mods> block which represents one of the <bibl>s in my bibliography. How do I specify that this block of <xenoData> refers to that <bibl>?

@decls is no help here, surely.

I could do it with @corresp (on <bibl> or on <xenoData> or even on both). But Kevin has expressed some reservations about that which I suspect from previous discussions that Lou would share. (I'd be OK with it, myself.) @sameAs would seem to be a candidate, except that (as almost always with @sameAs) when you look at the definition, it turns out that this relationship is not as same-as as necessary for a proper use of that attribute.

So I suggested another linking option. I don't really mind how it's done, but I do think it's important that people be able to link between their <xenoData> blocks and other elements -- any elements -- in their files. In some cases such as FOAF, the standard itself provides a way to link out from the element to its target:

<foaf:Person rdf:about="#danbri" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:name>Dan Brickley</foaf:name> <foaf:homepage rdf:resource="http://danbri.org/" /> <foaf:openid rdf:resource="http://danbri.org/" /> <foaf:img rdf:resource="/images/me.jpg" /> </foaf:Person>

<http://xmlns.com/foaf/spec/>

but other XML vocabularies may not provide this; furthermore, parsing this requires that you understand the vocabulary in the <xenoData>. Also it may well be the case (presumably) that non-XML vocabularies may be used in <xenoData>. So I think there's a strong case for a method of linking to be available at the TEI level.

Martin Holmes

12 Jul 12 Jul

2:05 a.m.

New subject: [tei-council] question on <xenoData>

HI Syd, On 15-07-11 11:03 AM, Syd Bauman wrote:

...

Hi Martin --

You have a MODS record for one item in your transcribed bibliography, and want to put that into your TEI file? Power to you, I suppose. But that doesn't seem much like the feature OP (Kevin) requested, or the "problem" for which librarians have been asking for a solution for almost a decade.

<xenoData>, IMHO, is intended to make the easy case easy, not to make the complex case possible (in large part because the complex case is already possible -- e.g., you can put <mods:*> as a child of <bibl>, or put it in your <teiHeader> attach it to the <bibl> with <link>).

(BTW, I prefer <link> to @corresp for this because I get to express which block of metadata is derived from which, which I can't express explicitly with @corresp.)

That's a good point; <link/> is a perfectly good solution, although it's less direct.

...

The easy case, the one which quite a few large projects have expressed a desire for, is to have a consistent place to put the METS record from which the <teiHeader> was derived, or the Dublin Core metadata that you derived from the <teiHeader> for whatever purpose, but don't want to re-generate on the fly.

So I'm inclined not to add another mechanism for attachment to <xenoData>. That said, I still want us all to agree on an easy explicit mechanism to say "about the source" vs "about this TEI file".

Speaking of which, Martin, I'm figuring the Dublin Core OAI example from the _The Colonial Despatches of Vancouver Island_ you posted on the ticket is about the TEI file, not the source, yes?

This is from my previous message on that:

...

The distinction between metadata about the source document and metadata about the electronic document is very fuzzy a lot of the time. One of the things I'm likely to store in <xenoData> is a block of OAI:PMH metadata which includes the date and title of the original document (source) as well as a list of individuals and places referred to in it (source? electronic?) along with information on the date the file was last changed its canonical URL (electronic). It's really closer to <teiHeader> than to e.g. <sourceDesc>.

...

MH> Also it may well be the case (presumably) that non-XML MH> vocabularies may be used in <xenoData>.

I, for one, had never envisioned non-XML data inside <xenoData>, and think it would be a bad idea to permit it. (For the same kinds of reasons I'm unhappy with <binaryObject>.)

I don't see how you can do anything about it. If you allow anyXML, anyone can just insert a CDATA section, can't they?

...

That said, it is not at all clear to me whether the content of <xenoData> should be well-formed XML or if well-balanced XML is sufficient. (The difference being well-formed means that <xenoData> can only have 1 child; well-balanced would permit multiple children.) As I originally wrote it, because the content model of <xenoData> was just the already existing 'macro.anyXML', well-formedness was required. I have recently changed it to 'macro.anyXML+'[1], thus allowing well-balanced fragments.

Notes ----- [1] Not entirely true, I changed it to 'macro.anyXML*', about which I plan to post shortly.

Interested to see why. Cheers,

...

...
I don't think we're talking at cross-purposes. Let's take a concrete example. Say I have a bibliography encoded as a <listBibl> in the <back> of my document. I have a <xenoData> element in my header that contains a <mods> block which represents one of the <bibl>s in my bibliography. How do I specify that this block of <xenoData> refers to that <bibl>?

@decls is no help here, surely.

I could do it with @corresp (on <bibl> or on <xenoData> or even on both). But Kevin has expressed some reservations about that which I suspect from previous discussions that Lou would share. (I'd be OK with it, myself.) @sameAs would seem to be a candidate, except that (as almost always with @sameAs) when you look at the definition, it turns out that this relationship is not as same-as as necessary for a proper use of that attribute.

So I suggested another linking option. I don't really mind how it's done, but I do think it's important that people be able to link between their <xenoData> blocks and other elements -- any elements -- in their files. In some cases such as FOAF, the standard itself provides a way to link out from the element to its target:

<foaf:Person rdf:about="#danbri" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:name>Dan Brickley</foaf:name> <foaf:homepage rdf:resource="http://danbri.org/" /> <foaf:openid rdf:resource="http://danbri.org/" /> <foaf:img rdf:resource="/images/me.jpg" /> </foaf:Person>

<http://xmlns.com/foaf/spec/>

but other XML vocabularies may not provide this; furthermore, parsing this requires that you understand the vocabulary in the <xenoData>. Also it may well be the case (presumably) that non-XML vocabularies may be used in <xenoData>. So I think there's a strong case for a method of linking to be available at the TEI leve

Martin Holmes

2:08 a.m.

New subject: [tei-council] question on <xenoData>

Hi syd, On 15-07-11 11:03 AM, Syd Bauman wrote:

...

Hi Martin --

You have a MODS record for one item in your transcribed bibliography, and want to put that into your TEI file? Power to you, I suppose. But that doesn't seem much like the feature OP (Kevin) requested, or the "problem" for which librarians have been asking for a solution for almost a decade.

<xenoData>, IMHO, is intended to make the easy case easy, not to make the complex case possible (in large part because the complex case is already possible -- e.g., you can put <mods:*> as a child of <bibl>, or put it in your <teiHeader> attach it to the <bibl> with <link>).

(BTW, I prefer <link> to @corresp for this because I get to express which block of metadata is derived from which, which I can't express explicitly with @corresp.)

That makes sense; it's less direct, but perfectly workable.

...

The easy case, the one which quite a few large projects have expressed a desire for, is to have a consistent place to put the METS record from which the <teiHeader> was derived, or the Dublin Core metadata that you derived from the <teiHeader> for whatever purpose, but don't want to re-generate on the fly.

So I'm inclined not to add another mechanism for attachment to <xenoData>. That said, I still want us all to agree on an easy explicit mechanism to say "about the source" vs "about this TEI file".

Speaking of which, Martin, I'm figuring the Dublin Core OAI example from the _The Colonial Despatches of Vancouver Island_ you posted on the ticket is about the TEI file, not the source, yes?

This is from my previous message about this:

...

The distinction between metadata about the source document and metadata about the electronic document is very fuzzy a lot of the time. One of the things I'm likely to store in <xenoData> is a block of OAI:PMH metadata which includes the date and title of the original document (source) as well as a list of individuals and places referred to in it (source? electronic?) along with information on the date the file was last changed its canonical URL (electronic). It's really closer to <teiHeader> than to e.g. <sourceDesc>.

...

MH> Also it may well be the case (presumably) that non-XML MH> vocabularies may be used in <xenoData>.

I, for one, had never envisioned non-XML data inside <xenoData>, and think it would be a bad idea to permit it. (For the same kinds of reasons I'm unhappy with <binaryObject>.)

I don't see how you can prevent it. macro.anyXML will allow the user to insert a CDATA section if they want, won't it?

...

That said, it is not at all clear to me whether the content of <xenoData> should be well-formed XML or if well-balanced XML is sufficient. (The difference being well-formed means that <xenoData> can only have 1 child; well-balanced would permit multiple children.) As I originally wrote it, because the content model of <xenoData> was just the already existing 'macro.anyXML', well-formedness was required. I have recently changed it to 'macro.anyXML+'[1], thus allowing well-balanced fragments.

Notes ----- [1] Not entirely true, I changed it to 'macro.anyXML*', about which I plan to post shortly.

I'm interested to see why. Cheers, Martin

...

...
I don't think we're talking at cross-purposes. Let's take a concrete example. Say I have a bibliography encoded as a <listBibl> in the <back> of my document. I have a <xenoData> element in my header that contains a <mods> block which represents one of the <bibl>s in my bibliography. How do I specify that this block of <xenoData> refers to that <bibl>?

@decls is no help here, surely.

I could do it with @corresp (on <bibl> or on <xenoData> or even on both). But Kevin has expressed some reservations about that which I suspect from previous discussions that Lou would share. (I'd be OK with it, myself.) @sameAs would seem to be a candidate, except that (as almost always with @sameAs) when you look at the definition, it turns out that this relationship is not as same-as as necessary for a proper use of that attribute.

So I suggested another linking option. I don't really mind how it's done, but I do think it's important that people be able to link between their <xenoData> blocks and other elements -- any elements -- in their files. In some cases such as FOAF, the standard itself provides a way to link out from the element to its target:

<foaf:Person rdf:about="#danbri" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:name>Dan Brickley</foaf:name> <foaf:homepage rdf:resource="http://danbri.org/" /> <foaf:openid rdf:resource="http://danbri.org/" /> <foaf:img rdf:resource="/images/me.jpg" /> </foaf:Person>

<http://xmlns.com/foaf/spec/>

but other XML vocabularies may not provide this; furthermore, parsing this requires that you understand the vocabulary in the <xenoData>. Also it may well be the case (presumably) that non-XML vocabularies may be used in <xenoData>. So I think there's a strong case for a method of linking to be available at the TEI level.

Syd Bauman

13 Jul 13 Jul

9:42 p.m.

New subject: [tei-council] question on <xenoData>

...

That makes sense; it's less direct, but perfectly workable.

Good, good.

...

This is from my previous message about this:

...
The distinction between metadata about the source document and metadata about the electronic document is very fuzzy a lot of the time. One of the things I'm likely to store in <xenoData> is a block of OAI:PMH metadata which includes the date and title of the original document (source) as well as a list of individuals and places referred to in it (source? electronic?) along with information on the date the file was last changed its canonical URL (electronic). It's really closer to <teiHeader> than to e.g. <sourceDesc>.

I think you are completely right, and it doesn't change my opinion at all. I still think we should have an easy way to express, formally and unambiguously, that "this is about the source" or "this is about the TEI file". The fact that in some (perhaps many or even most) cases an encoder would choose not to take advantage of this capability because it is (understandably) a pain to tease the two apart, or because she has things for which it's not clear, does not mean the capability shouldn't be there. (It does mean the capability should be optional, pushing us towards an attribute instead of position of <xenoData> in the tree.)

...

I don't see how you can prevent it [non-XML data]. macro.anyXML will allow the user to insert a CDATA section if they want, won't it?

Well, yes and no. One would have to wrap the CDATA marked section with an XML element. E.g. <xenoData> <syd:sillyNonXMLmetadata> <![CDATA[of course, I have to be very careful with what goes in here, as the string U+5D, U+5D, U+3E would wreak havoc.]]> </syd:sillyNonXMLmetadata> </xenoData> But again, that's true already. I don't need a <xenoData> to stick a <syd:sillyNonXMLmetadata> in my TEI document.

...

...
[1] Not entirely true, I changed it to 'macro.anyXML*', about which I plan to post shortly.

I'm interested to see why.

Actually, I did change it to macro.anyXML+ after I realized setting @valid to "feasible" would allow the examples to be considered valid. If anyone wants to make an argument in favor of empty <xenoData>s, speak up.

Hugh Cayless

10:08 p.m.

New subject: [tei-council] question on <xenoData>

Why do we want to insist on XML content for xenoData? I mean, obviously it'll have to be escaped for insertion into my XML document, but if I want to put JSON-LD or RDF in Turtle format, or some other thing, I should be able to, surely? I sometimes have non-XML source data that I use to generate my TEI (or parts of it). Should I be forced to wrap it in an element of my own invention in order to insert it? On Mon, Jul 13, 2015 at 3:42 PM, Syd Bauman <syd@paramedic.wwp.neu.edu> wrote:

...

...
That makes sense; it's less direct, but perfectly workable.

Good, good.

...
This is from my previous message about this:

...
The distinction between metadata about the source document and metadata about the electronic document is very fuzzy a lot of the time. One of the things I'm likely to store in <xenoData> is a block of OAI:PMH metadata which includes the date and title of the original document (source) as well as a list of individuals and places referred to in it (source? electronic?) along with information on the date the file was last changed its canonical URL (electronic). It's really closer to <teiHeader> than to e.g. <sourceDesc>.

I think you are completely right, and it doesn't change my opinion at all. I still think we should have an easy way to express, formally and unambiguously, that "this is about the source" or "this is about the TEI file". The fact that in some (perhaps many or even most) cases an encoder would choose not to take advantage of this capability because it is (understandably) a pain to tease the two apart, or because she has things for which it's not clear, does not mean the capability shouldn't be there. (It does mean the capability should be optional, pushing us towards an attribute instead of position of <xenoData> in the tree.)

...
I don't see how you can prevent it [non-XML data]. macro.anyXML will allow the user to insert a CDATA section if they want, won't it?

Well, yes and no. One would have to wrap the CDATA marked section with an XML element. E.g. <xenoData> <syd:sillyNonXMLmetadata> <![CDATA[of course, I have to be very careful with what goes in here, as the string U+5D, U+5D, U+3E would wreak havoc.]]> </syd:sillyNonXMLmetadata> </xenoData>

But again, that's true already. I don't need a <xenoData> to stick a <syd:sillyNonXMLmetadata> in my TEI document.

...
...
[1] Not entirely true, I changed it to 'macro.anyXML*', about which I plan to post shortly.

I'm interested to see why.

Actually, I did change it to macro.anyXML+ after I realized setting @valid to "feasible" would allow the examples to be considered valid. If anyone wants to make an argument in favor of empty <xenoData>s, speak up. -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

Syd Bauman

11:34 p.m.

New subject: [tei-council] question on <xenoData>

Well, *I* think it's an afront to god and country to insert non-XML data, but if I'm the only one, it's easy enough to change.

...

Why do we want to insist on XML content for xenoData? I mean, obviously it'll have to be escaped for insertion into my XML document, but if I want to put JSON-LD or RDF in Turtle format, or some other thing, I should be able to, surely? I sometimes have non-XML source data that I use to generate my TEI (or parts of it). Should I be forced to wrap it in an element of my own invention in order to insert it?

Lou Burnard

14 Jul 14 Jul

10:55 a.m.

New subject: [tei-council] question on <xenoData>

You can't permit non-XML data unless it's wrapped in a CDATA marked section, methinks (so tough luck if your non-TEI xenodata has something like "]]>" in it at some point :-( ) but otherwise I don't see why this should pose a problem. Apologies for commenting a bit out of phase on this... my French rural wifi connexion is sporadic On 13/07/15 22:34, Syd Bauman wrote:

...

Well, *I* think it's an afront to god and country to insert non-XML data, but if I'm the only one, it's easy enough to change.

...
Why do we want to insist on XML content for xenoData? I mean, obviously it'll have to be escaped for insertion into my XML document, but if I want to put JSON-LD or RDF in Turtle format, or some other thing, I should be able to, surely? I sometimes have non-XML source data that I use to generate my TEI (or parts of it). Should I be forced to wrap it in an element of my own invention in order to insert it?

Syd Bauman

3:12 p.m.

New subject: [tei-council] question on <xenoData>

If the content model allows <rng:text> (or, I presume, <textNode>), a user can enter non-XML data so long as a) it is proper UTF-8 or whatever, b) it does not contain "<", c) it does not contain "&", and d) user is willing to think through white space issues. Since base64 encoding only uses A-Z, a-z, 0-9, +, /, and =, that's the typical method of including non-XML data in, say, <binaryData>. Why anyone would want to convert their JSON to base64 so they can store it in their TEI file is beyond me. But I'm pretty myopic, I guess. But this reminds me that if we do go this route, we would want to factor out @encoding of <binaryObject> to a class (att.binaryEncoding?), and add <xenoData> to that class. Blech.

...

You can't permit non-XML data unless it's wrapped in a CDATA marked section, methinks (so tough luck if your non-TEI xenodata has something like "]]>" in it at some point :-( ) but otherwise I don't see why this should pose a problem.

Apologies for commenting a bit out of phase on this... my French rural wifi connexion is sporadic

Martin Holmes

7:17 p.m.

New subject: [tei-council] question on <xenoData>

Hi Syd, On 15-07-14 06:12 AM, Syd Bauman wrote:

...

If the content model allows <rng:text> (or, I presume, <textNode>), a user can enter non-XML data so long as a) it is proper UTF-8 or whatever, b) it does not contain "<", c) it does not contain "&", and d) user is willing to think through white space issues.

Since base64 encoding only uses A-Z, a-z, 0-9, +, /, and =, that's the typical method of including non-XML data in, say, <binaryData>.

Why anyone would want to convert their JSON to base64 so they can store it in their TEI file is beyond me. But I'm pretty myopic, I guess.

The main reason for storing this sort of thing in my projects would be that it's time-consuming and processor-expensive to generate on the fly.

...

But this reminds me that if we do go this route, we would want to factor out @encoding of <binaryObject> to a class (att.binaryEncoding?), and add <xenoData> to that class. Blech.

We're not actually asking for binary data here; we're just asking for the possibility of non-XML data. I don't forsee any circumstances in which I'd want to include binary data. Mostly we'll be talking about existing metadata standards and transmission-friendly standards such as JSON, I think. Cheers, Martin

...

...
You can't permit non-XML data unless it's wrapped in a CDATA marked section, methinks (so tough luck if your non-TEI xenodata has something like "]]>" in it at some point :-( ) but otherwise I don't see why this should pose a problem.

Apologies for commenting a bit out of phase on this... my French rural wifi connexion is sporadic

Syd Bauman

8:22 p.m.

New subject: [tei-council] question on <xenoData>

...

...
Why anyone would want to convert their JSON to base64 so they can store it in their TEI file is beyond me. But I'm pretty myopic, I guess.

The main reason for storing this sort of thing in my projects would be that it's time-consuming and processor-expensive to generate on the fly.

Yes, but why store it in myDoc.tei rather than in myDoc.json? Seems like you'd be making your life harder. But that's just MHO.

...

We're not actually asking for binary data here; we're just asking for the possibility of non-XML data. I don't forsee any circumstances in which I'd want to include binary data. Mostly we'll be talking about existing metadata standards and transmission-friendly standards such as JSON, I think.

So you're saying that we should explicitly permit non-XML character-based formats (like JSON), but disallow binary formats (that have, by necessity, been converted to a character format, of course)? How do we tell the difference? I suppose if we don't provide @encoding or some other mechanism to say how the binary data has been converted to character data we can just say we disallow it. But even JSON, if I understand correctly, is not XML-safe, as "<" and "&" are permitted as unescaped characters in a JSON string, no? I.e., "Barnes & Noble" would have to be written "Barnes & Noble" or "Barnes \u0026 Noble". Or, as Lou points out, be wrapped in a CDATA marked section.

Martin Holmes

10:31 p.m.

New subject: [tei-council] question on <xenoData>

Hi Syd, On 15-07-14 11:22 AM, Syd Bauman wrote:

...

...
...
Why anyone would want to convert their JSON to base64 so they can store it in their TEI file is beyond me. But I'm pretty myopic, I guess.

The main reason for storing this sort of thing in my projects would be that it's time-consuming and processor-expensive to generate on the fly.

Yes, but why store it in myDoc.tei rather than in myDoc.json? Seems like you'd be making your life harder. But that's just MHO.

Right now I have over 7,300 despatches documents in TEI, and over 7,300 parallel OAI-PMH files in a separate collection. If a despatches file changes its name or gets deleted, I have to remember to remove the parallel OAI-PMH file from both SVN and from the eXist instance; inevitably that sometimes fails to happen and they get out of sync. Life will be much simpler with <xenoData>.

...

...
We're not actually asking for binary data here; we're just asking for the possibility of non-XML data. I don't forsee any circumstances in which I'd want to include binary data. Mostly we'll be talking about existing metadata standards and transmission-friendly standards such as JSON, I think.

So you're saying that we should explicitly permit non-XML character-based formats (like JSON), but disallow binary formats (that have, by necessity, been converted to a character format, of course)?

I didn't say we should disallow them; I just don't think any of the requesters or use-cases so far has mentioned them, so I don't think it's important to worry about them. If someone wants to include binary data, wouldn't they use <binaryObject>?

...

How do we tell the difference? I suppose if we don't provide @encoding or some other mechanism to say how the binary data has been converted to character data we can just say we disallow it. But even JSON, if I understand correctly, is not XML-safe, as "<" and "&" are permitted as unescaped characters in a JSON string, no? I.e., "Barnes & Noble" would have to be written "Barnes & Noble" or "Barnes \u0026 Noble". Or, as Lou points out, be wrapped in a CDATA marked section.

I think we'd always choose to use a CDATA section, surely. I think it's quite unlikely that "]]>" would occur in JSON; "]]" would be very common (nested arrays), but the following angle bracket is very unlikely to occur. So most likely JSON could be used unchanged in CDATA. Cheers, Martin

Martin Holmes

13 Jul 13 Jul

10:49 p.m.

New subject: [tei-council] question on <xenoData>

Hi Syd, On 15-07-13 12:42 PM, Syd Bauman wrote:

...

...
That makes sense; it's less direct, but perfectly workable.

Good, good.

I wish it were directional, but still. :-)

...

...
This is from my previous message about this:

...
The distinction between metadata about the source document and metadata about the electronic document is very fuzzy a lot of the time. One of the things I'm likely to store in <xenoData> is a block of OAI:PMH metadata which includes the date and title of the original document (source) as well as a list of individuals and places referred to in it (source? electronic?) along with information on the date the file was last changed its canonical URL (electronic). It's really closer to <teiHeader> than to e.g. <sourceDesc>.

I think you are completely right, and it doesn't change my opinion at all. I still think we should have an easy way to express, formally and unambiguously, that "this is about the source" or "this is about the TEI file". The fact that in some (perhaps many or even most) cases an encoder would choose not to take advantage of this capability because it is (understandably) a pain to tease the two apart, or because she has things for which it's not clear, does not mean the capability shouldn't be there. (It does mean the capability should be optional, pushing us towards an attribute instead of position of <xenoData> in the tree.)

Good point, I agree.

...

...
I don't see how you can prevent it [non-XML data]. macro.anyXML will allow the user to insert a CDATA section if they want, won't it?

Well, yes and no. One would have to wrap the CDATA marked section with an XML element. E.g. <xenoData> <syd:sillyNonXMLmetadata> <![CDATA[of course, I have to be very careful with what goes in here, as the string U+5D, U+5D, U+3E would wreak havoc.]]> </syd:sillyNonXMLmetadata> </xenoData>

Ah -- I see your content model is this: <content> <group xmlns="http://relaxng.org/ns/structure/1.0"> <oneOrMore> <ref name="macro.anyXML"/> </oneOrMore> </group> </content> whereas I'd been thinking you'd use the same model as <egXML>: <content> <rng:zeroOrMore> <rng:group> <rng:choice> <rng:text/> <rng:ref name="macro.anyXML"/> </rng:choice> </rng:group> </rng:zeroOrMore> </content> where I can happily put text or a CDATA section. Do we really want to exclude non-XML data?

...

But again, that's true already. I don't need a <xenoData> to stick a <syd:sillyNonXMLmetadata> in my TEI document.

...
...
[1] Not entirely true, I changed it to 'macro.anyXML*', about which I plan to post shortly.

I'm interested to see why.

Actually, I did change it to macro.anyXML+ after I realized setting @valid to "feasible" would allow the examples to be considered valid. If anyone wants to make an argument in favor of empty <xenoData>s, speak up.

Not empty, but non-XML is worth considering, surely? Cheers, Martin

Syd Bauman

11:45 p.m.

New subject: [tei-council] question on <xenoData>

...

Ah -- I see your content model is this: ( macro.anyXML+ ) whereas I'd been thinking you'd use the same model as <egXML>: ( text | macro.anyXML )* where I can happily put text or a CDATA section. Do we really want to exclude non-XML data?

See previous post in reply to Hugh, who seems to want non-XML data.

...

Not empty, but non-XML is worth considering, surely?

If we allow non-XML, we have to use a facet or a Schematron rule to insist on non-empty (since <rng:text/> matches 0 or more characters). I'm not sure it's worth the bother. (Especially if <xenoData> remains in att.typed, in which case it is arguable that saying there is no metadata of type X is making an assertion.) So I'm inclined to leave it as-is, or if those who want non-XML data win the day, to change it to the same content model as <egXML>. After all, that's what Martin expected, so it's reasonable to think others might expect that, too. BTW, I won't cry if we go the <egXML> content model (i.e., ( text | macro.anyXML )*) route.

Lou Burnard

14 Jul 14 Jul

10:37 a.m.

New subject: [tei-council] question on <xenoData>

On 11/07/15 19:03, Syd Bauman wrote:

...

<xenoData>, IMHO, is intended to make the easy case easy, not to make the complex case possible (in large part because the complex case is already possible -- e.g., you can put <mods:*> as a child of <bibl>, or put it in your <teiHeader> attach it to the <bibl> with <link>).

I agree with Syd here. Martin's proposed use case is rather specialised.

...

The easy case, the one which quite a few large projects have expressed a desire for, is to have a consistent place to put the METS record from which the <teiHeader> was derived, or the Dublin Core metadata that you derived from the <teiHeader> for whatever purpose, but don't want to re-generate on the fly.

Exactly.

...

I, for one, had never envisioned non-XML data inside <xenoData>, and think it would be a bad idea to permit it.

I don't think we're in a position to permit or not permit anything inside <xenoData> : it is explicitly non-TEI data, so all bets are off. Of course if it's not well-formed XML it will need to be wrapped in a CDATA section or similar.

Martin Holmes

10 Jul 10 Jul

6:21 a.m.

New subject: [tei-council] question on <xenoData>

On 15-07-09 01:14 PM, Lou Burnard wrote:

...

On 09/07/15 17:04, Martin Holmes wrote:

...
We already have four completely different @scope attributes; please let's not add a fifth

Well, I am not so sure these four are all semantically "completely different" -- they all say something about the extent to which the annotation concerned is universally (or not) applicable. Which is more or less what we want here, surley? But by all means let's have anothjer suggestion.

...
How about adding <xenoData> to att.pointing? Then it can be linked very precisely to the location(s) in the rest of the file to which the metadata applies. This could be done with e.g. @corresp, but we know that causes discomfort in some folks.

Aaaargh. Whence this spurious desire for "precise linking"? This is xenodata. It could be anything. We cannot say anything about what it applies to and shouldnt pretend to be able to.

I thought this was what the whole discussion was about. What on earth is @scope supposed to be doing if it isn't telling you something about what the "annotation" is "applicable" to? @scope is typically very imprecise ("sole"|"major"|"minor", "all"|"most"|"range"), whereas a pointing attribute would give complete precision. I find it hard to see why precision is not better than vagueness. Cheers, Martin

...

...
Cheers, Martin

Boos,

Lou

...
On 15-07-09 08:29 AM, James Cummings wrote:

...
I could be convinced of those two attributes.

But yes, I wouldn't provide suggested values at all.

-James

On 09/07/15 14:26, Lou Burnard wrote:

...
Please please let's not use @type for this !

How about @scheme (for the format of the metadata) and @scope (for its err scope, i.e. what it's about) if you insist on saying anything about either?

I also think that any typology of closed values we might propose will just look silly.

On 09/07/15 03:22, Kevin Hawkins wrote:

...
I was just about to write to the list with the same concern about using @type.

As for the value of the attribute (whatever name you choose) ... While "source" is well established in the Guidelines for the thing described by <sourceDesc>, "transcription" is not generally used for the thing described by the other children of <fileDesc>. After all, the Guidelines claim to be equally applicable not only to a manual transcription of a written or audio document but also to an electronic text created through automated means, even from another electronic source. So "electronic text" is often used, as are "computer file", "electronic file", and "electronic work". So maybe the suggested value of the attribute could be one of the following:

electronicText computerFile electronicFile electronicWork

--Kevin

On 7/8/15 9:17 PM, Syd Bauman wrote:

...
But just to be explicit about this, we are (deliberately) making the same mistake TEI made with @type of <name> and friends: the value of @type here does not describe the type of the element itself or the stuff inside the element, but rather categorizes the type of stuff that the elements inside refer to or describe. (That's why I liked @descirbes better, but if <xenoData> is going to be used for so much more than describing the source and describing the TEI file, that may not be a good idea.)

> I'm convinced. I'll put it into att.typed and give it a private > copy of @type with a suggested values include list of > "source" and > "transcription" (anyone come up with a better term?) and let > Martin > or anyone else add whatever others seem appropriate. > > People can then poke at that.

3747

Age (days ago)

3757

Last active (days ago)

List overview

Download

36 comments

8 participants

participants (8)

Fabio Ciotti
Hugh Cayless
James Cummings
James Cummings
Kevin Hawkins
Lou Burnard
Martin Holmes
Syd Bauman