Recommendations on values of @xml:id (FR #540)

Raffaele Viglianti

20 Feb 2015 20 Feb '15

11:07 p.m.

Hi all, Martin Mueller asked on the list for recommendations on good uses of xml:ids, which resulted in a good discussion and a FR. I summarized the main suggestions on the ticket on SourceForge, but I post them here as well because I think we need to discuss this further, see below. The Guidelines already have some recommendations (numbering based on doc structure): * The Guidelines already suggest numbering based on doc structure ( http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CORS2) * Use 3 letters (e.g. from the title)+ 3 digits, incremental. E.g. HOL001, HOL002, etc.. * Same as above, but no fix number of digits. E.g. HOL1, HOL2, etc. * Prefix an id with name of element (this is simpler version of what the Guidelines already recommend) * Give tei:TEI an id and prefix every other id in the document with it (to guarantee cross-corpus uniqueness) These are all reasonable suggestions and there can be plenty more - which is why I think that the TEI should *not* give any recommendation on best practices for xml:id because it's a project management issue, not an encoding one. Myself, I prefer random ids, a practice that avoids introducing yet another level of complexity and data management. I understand the human readability, but sequences are too easily broken. And when parsing, relying on ID content instead of TEI content sounds like a bad idea.

Show replies by date

Hugh Cayless

20 Feb 20 Feb

11:23 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

+1. Is it worth having a paragraph in the GLs discussing the issue without making recommendations?

...

On Feb 20, 2015, at 17:07 , Raffaele Viglianti <raffaeleviglianti@gmail.com> wrote:

Hi all,

Martin Mueller asked on the list for recommendations on good uses of xml:ids, which resulted in a good discussion and a FR.

I summarized the main suggestions on the ticket on SourceForge, but I post them here as well because I think we need to discuss this further, see below.

The Guidelines already have some recommendations (numbering based on doc structure):

* The Guidelines already suggest numbering based on doc structure ( http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CORS2) * Use 3 letters (e.g. from the title)+ 3 digits, incremental. E.g. HOL001, HOL002, etc.. * Same as above, but no fix number of digits. E.g. HOL1, HOL2, etc. * Prefix an id with name of element (this is simpler version of what the Guidelines already recommend) * Give tei:TEI an id and prefix every other id in the document with it (to guarantee cross-corpus uniqueness)

These are all reasonable suggestions and there can be plenty more - which is why I think that the TEI should *not* give any recommendation on best practices for xml:id because it's a project management issue, not an encoding one.

Myself, I prefer random ids, a practice that avoids introducing yet another level of complexity and data management. I understand the human readability, but sequences are too easily broken. And when parsing, relying on ID content instead of TEI content sounds like a bad idea. -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

Martin Holmes

11:42 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

On 15-02-20 02:23 PM, Hugh Cayless wrote:

...

+1. Is it worth having a paragraph in the GLs discussing the issue without making recommendations?

I agree with both points: don't recommend, but do discuss briefly a range of possible options. I don't like random ids, because they're extremely difficult to keep in mind for any length of time. Semi-meaningful ids (FRED1, LOND47) are certainly not useful for sorting or sequencing, but when you need to type them into a search or type a few of them, they're much easier to deal with. Cheers, Martin

...

...
On Feb 20, 2015, at 17:07 , Raffaele Viglianti <raffaeleviglianti@gmail.com> wrote:

Hi all,

Martin Mueller asked on the list for recommendations on good uses of xml:ids, which resulted in a good discussion and a FR.

I summarized the main suggestions on the ticket on SourceForge, but I post them here as well because I think we need to discuss this further, see below.

The Guidelines already have some recommendations (numbering based on doc structure):

* The Guidelines already suggest numbering based on doc structure ( http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CORS2) * Use 3 letters (e.g. from the title)+ 3 digits, incremental. E.g. HOL001, HOL002, etc.. * Same as above, but no fix number of digits. E.g. HOL1, HOL2, etc. * Prefix an id with name of element (this is simpler version of what the Guidelines already recommend) * Give tei:TEI an id and prefix every other id in the document with it (to guarantee cross-corpus uniqueness)

These are all reasonable suggestions and there can be plenty more - which is why I think that the TEI should *not* give any recommendation on best practices for xml:id because it's a project management issue, not an encoding one.

Myself, I prefer random ids, a practice that avoids introducing yet another level of complexity and data management. I understand the human readability, but sequences are too easily broken. And when parsing, relying on ID content instead of TEI content sounds like a bad idea. -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

Peter Stadler

21 Feb 21 Feb

9:32 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

Just wanted to add two issues I find very important concerning IDs (and which were not mentioned yet — please excuse if it was): * IDs should not change. Hence, there should nothing be encoded within the ID because everything can be subject to change. To illustrate this point I found this nice one from Tim Berners Lee: „Cool URIs don't change“ [1]. I think it’s legitimate to put URI = xml:id, here. * I’d even add a check digit [2] to my IDs Best Peter [1] http://www.w3.org/Provider/Style/URI [2] http://en.wikipedia.org/wiki/Check_digit

...

Am 20.02.2015 um 23:42 schrieb Martin Holmes <mholmes@UVIC.CA>:

On 15-02-20 02:23 PM, Hugh Cayless wrote:

...
+1. Is it worth having a paragraph in the GLs discussing the issue without making recommendations?

I agree with both points: don't recommend, but do discuss briefly a range of possible options.

I don't like random ids, because they're extremely difficult to keep in mind for any length of time. Semi-meaningful ids (FRED1, LOND47) are certainly not useful for sorting or sequencing, but when you need to type them into a search or type a few of them, they're much easier to deal with.

Cheers, Martin

...
...
On Feb 20, 2015, at 17:07 , Raffaele Viglianti <raffaeleviglianti@gmail.com> wrote:

Hi all,

Martin Mueller asked on the list for recommendations on good uses of xml:ids, which resulted in a good discussion and a FR.

I summarized the main suggestions on the ticket on SourceForge, but I post them here as well because I think we need to discuss this further, see below.

The Guidelines already have some recommendations (numbering based on doc structure):

* The Guidelines already suggest numbering based on doc structure ( http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CORS2) * Use 3 letters (e.g. from the title)+ 3 digits, incremental. E.g. HOL001, HOL002, etc.. * Same as above, but no fix number of digits. E.g. HOL1, HOL2, etc. * Prefix an id with name of element (this is simpler version of what the Guidelines already recommend) * Give tei:TEI an id and prefix every other id in the document with it (to guarantee cross-corpus uniqueness)

These are all reasonable suggestions and there can be plenty more - which is why I think that the TEI should *not* give any recommendation on best practices for xml:id because it's a project management issue, not an encoding one.

Myself, I prefer random ids, a practice that avoids introducing yet another level of complexity and data management. I understand the human readability, but sequences are too easily broken. And when parsing, relying on ID content instead of TEI content sounds like a bad idea. -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

Martin Holmes

22 Feb 22 Feb

5:26 a.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

A check digit is an interesting idea. But generally, I'd say that if machines are manipulating documents containing ids, the ids won't be garbled; if humans are manipulating them, they're as likely to make an error in the check digit as in the rest of the id. The most consistent problem I face with ids is keeping them globally unique across a large project. There are a number of solutions to this -- consistency check routines that run on the whole corpus, a single doc which XIncludes everything else and must be validated, etc. -- but none is quite as convenient as I'd like. Cheers, Martin On 15-02-21 12:32 PM, Peter Stadler wrote:

...

Just wanted to add two issues I find very important concerning IDs (and which were not mentioned yet — please excuse if it was): * IDs should not change. Hence, there should nothing be encoded within the ID because everything can be subject to change. To illustrate this point I found this nice one from Tim Berners Lee: „Cool URIs don't change“ [1]. I think it’s legitimate to put URI = xml:id, here. * I’d even add a check digit [2] to my IDs

Best Peter

[1] http://www.w3.org/Provider/Style/URI [2] http://en.wikipedia.org/wiki/Check_digit

...
Am 20.02.2015 um 23:42 schrieb Martin Holmes <mholmes@UVIC.CA>:

On 15-02-20 02:23 PM, Hugh Cayless wrote:

...
+1. Is it worth having a paragraph in the GLs discussing the issue without making recommendations?

I agree with both points: don't recommend, but do discuss briefly a range of possible options.

I don't like random ids, because they're extremely difficult to keep in mind for any length of time. Semi-meaningful ids (FRED1, LOND47) are certainly not useful for sorting or sequencing, but when you need to type them into a search or type a few of them, they're much easier to deal with.

Cheers, Martin

...
...
On Feb 20, 2015, at 17:07 , Raffaele Viglianti <raffaeleviglianti@gmail.com> wrote:

Hi all,

Martin Mueller asked on the list for recommendations on good uses of xml:ids, which resulted in a good discussion and a FR.

I summarized the main suggestions on the ticket on SourceForge, but I post them here as well because I think we need to discuss this further, see below.

The Guidelines already have some recommendations (numbering based on doc structure):

* The Guidelines already suggest numbering based on doc structure ( http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html#CORS2) * Use 3 letters (e.g. from the title)+ 3 digits, incremental. E.g. HOL001, HOL002, etc.. * Same as above, but no fix number of digits. E.g. HOL1, HOL2, etc. * Prefix an id with name of element (this is simpler version of what the Guidelines already recommend) * Give tei:TEI an id and prefix every other id in the document with it (to guarantee cross-corpus uniqueness)

These are all reasonable suggestions and there can be plenty more - which is why I think that the TEI should *not* give any recommendation on best practices for xml:id because it's a project management issue, not an encoding one.

Myself, I prefer random ids, a practice that avoids introducing yet another level of complexity and data management. I understand the human readability, but sequences are too easily broken. And when parsing, relying on ID content instead of TEI content sounds like a bad idea. -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

-- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

Peter Stadler

23 Feb 23 Feb

1:57 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

...

Am 22.02.2015 um 05:26 schrieb Martin Holmes <mholmes@uvic.ca>:

A check digit is an interesting idea. But generally, I'd say that if machines are manipulating documents containing ids, the ids won't be garbled; if humans are manipulating them, they're as likely to make an error in the check digit as in the rest of the id. The point is, with a check digit you will find those false IDs. Without, you can’t tell whether LOND47 is actually LOND74 with accidentally transposed digits.

Cheers Peter

Martin Holmes

3:02 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

HI Peter, On 15-02-23 04:57 AM, Peter Stadler wrote:

...

...
Am 22.02.2015 um 05:26 schrieb Martin Holmes <mholmes@uvic.ca>:

A check digit is an interesting idea. But generally, I'd say that if machines are manipulating documents containing ids, the ids won't be garbled; if humans are manipulating them, they're as likely to make an error in the check digit as in the rest of the id. The point is, with a check digit you will find those false IDs. Without, you can’t tell whether LOND47 is actually LOND74 with accidentally transposed digits.

I understand that; my point was that the additional check digit becomes another place where an error can be made, which didn't exist before. A human encoder may get the whole id right except for the check digit. but I guess Schematron could be checking that on the fly. Cheers, Martin

Majewski Stefan

11:22 a.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

Am 20.02.2015 23:42, schrieb Martin Holmes:

...

...
+1. Is it worth having a paragraph in the GLs discussing the issue without making recommendations?

I agree with both points: don't recommend, but do discuss briefly a range of possible options.

I don't like random ids, because they're extremely difficult to keep in mind for any length of time. Semi-meaningful ids (FRED1, LOND47) are certainly not useful for sorting or sequencing, but when you need to type them into a search or type a few of them, they're much easier to deal with.

I think, when soliciting the use of IDs, we should also give some recommendations on how tu use IDs. If even only to suggest that a xml:id pattern is useful and should tailored to the particular needs of a project. I would list the following suggestions: - if a project uses multiple XML sources that are aggregated at some point via mechanisms such as XInclude, @xml:id values must be unique with the resulting document. One way to achieve this is to prepend a document specific identifier to each @xml:id in the document. - if human encoders are using the @xml:id values in their encoding I recommend to use a mnemonic identifier that is likely not to conflict with another ID within the scope and context of the document. For example, when writing born digital documents and referencing <biblStructs> from within the text to establish bibliographic relations @xml:id values as Bird2001 or Leech1972 can be a good idea. - don't overuse @xml:id. Only use @xml:id for referencing fragments of a document. The value of @xml:id shall not be parsed into components during document processing and not used as a basis to decide on a particular rendition or processing. Don't use @xml:id for sorting, use content bearing attributes or elements for this. The concept of IDs is generally one of the week spots in the XML world. While for other kinds of references, e.g. xlink:href, mechanisms as @xml:base have been introduced to cleanly resolve references, a similar mechanism for @xml:id and references to IDs are missing as IDs have to be unique beyond their immediate scope. While we could argue that this is project-management specific and beyond the scope of the guidelines, this appears not entirely convincing, as one of the aims of using XML in the first place and using TEI in the second, is to create resources that are likely to be re-used. This could be tackled by requiring the software that processes these IDs to replaced by unique generated ID values on inclusion and all references within the included documents with these values. But then, fragmental approaches where IDs are referenced that are known to resolve only after inclusion are a problem and might, in e.g. tei:ptr/@target only be resolved and replaced with the generated ID if it is known which document the referenced content resides in. Hence @target attributes like bibliography.xml#Bird2001 would here be beneficial, as these could be easily be replaced by the mechanisms analogous to what we find in xlink [1], especially regarding resolving against @xml:base [2][3]. Maybe this would be something to consider when defining P6 at some point to either move to xlink (maybe simplelink) or use the XMLBase recommendation [2]. [1] http://www.w3.org/TR/xlink/ [2] http://www.w3.org/TR/2001/REC-xmlbase-20010627/ [2] http://www.w3.org/TR/xlink/#link-locators -- Mag. Stefan Majewski Projektmanager Abteilung Forschung und Entwicklung Österreichische Nationalbibliothek Josefsplatz 1, 1015 Wien Tel.: (+43 1) 534 10-434 E-Mail: stefan.majewski@onb.ac.at Skype: stefan.majewski.onb.ac.at

James Cummings

1:39 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

On 23/02/15 10:22, Majewski Stefan wrote:

...

- don't overuse @xml:id. Only use @xml:id for referencing fragments of a document. The value of @xml:id shall not be parsed into components during document processing

I know I'm probably wrong but because I'm lazy I tend to prefer over-use of @xml:id. I understand it might not be theoretically sound, but then again I'd also like human-parseable values as well. So I might have IDs like poem1, poem1-stanza1, poem1-stanza1-line4, merely to facilitate other people pointing into my document and talking about it (or grabbing bits). Ok, I understand they can do so through other methods but it is a lot easier to say give me foo.xml#poem1-stanza1-line4 than to do a proper XPointer to it. Similarly, I've argued elsewhere that facilitation of stand-off digital editions is much more pragmatic if the underlying edition has provided IDs on the smallest reasonable level of granularity. I have put IDs on every single word before precisely for this purpose because if words are interspersed later then the IDs have not changed. If a name goes from #w10 to #w13 and you realise you left out one of his middle names, then adding in #w12a is mostly side-effect free.

...

and not used as a basis to decide on a particular rendition or processing. Don't use @xml:id for sorting, use content bearing attributes or elements for this. I do agree with this though. ;-)

...

The concept of IDs is generally one of the week spots in the XML world. While for other kinds of references, e.g. xlink:href, mechanisms as @xml:base have been introduced to cleanly resolve references, a similar mechanism for @xml:id and references to IDs are missing as IDs have to be unique beyond their immediate scope.

Isn't this easy to implement in schematron? And should we be doing so for TEI generally? If a data.pointer attribute starts with '#' should we be checking that an @xml:id exists with that id? Or do we already do that?

...

the included documents with these values. But then, fragmental approaches where IDs are referenced that are known to resolve only after inclusion are a problem and might, in e.g. tei:ptr/@target only be resolved and replaced with the generated ID if it is known which document the referenced content resides in.

Use case for private URI syntax?

...

Hence @target attributes like bibliography.xml#Bird2001 would here be beneficial, as these could be easily be replaced by the mechanisms analogous to what we find in xlink [1], especially regarding resolving against @xml:base [2][3]. Maybe this would be something to consider when defining P6 at some point to either move to xlink (maybe simplelink) or use the XMLBase recommendation [2].

I'd feel more comfortable moving to a recommendation of xlink if it had received widespread adoption. -James -- Dr James Cummings, James.Cummings@it.ox.ac.uk Academic IT Services, University of Oxford

Hugh Cayless

3:10 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

On Mon, Feb 23, 2015 at 7:39 AM, James Cummings <James.Cummings@it.ox.ac.uk> wrote: On 23/02/15 10:22, Majewski Stefan wrote:

...

...
- don't overuse @xml:id. Only use @xml:id for referencing fragments of a document. The value of @xml:id shall not be parsed into components during document processing

I know I'm probably wrong but because I'm lazy I tend to prefer over-use of @xml:id. I understand it might not be theoretically sound, but then again I'd also like human-parseable values as well. So I might have IDs like poem1, poem1-stanza1, poem1-stanza1-line4, merely to facilitate other people pointing into my document and talking about it (or grabbing bits). Ok, I understand they can do so through other methods but it is a lot easier to say give me foo.xml#poem1-stanza1-line4 than to do a proper XPointer to it.

Similarly, I've argued elsewhere that facilitation of stand-off digital editions is much more pragmatic if the underlying edition has provided IDs on the smallest reasonable level of granularity. I have put IDs on every single word before precisely for this purpose because if words are interspersed later then the IDs have not changed. If a name goes from #w10 to #w13 and you realise you left out one of his middle names, then adding in #w12a is mostly side-effect free.

I don't think these would qualify as over-use in Stefan's formulation,

since they're providing fragment identifiers. If I understand right, he's saying "don't overload the semantics of xml:id", i.e. don't use them in a such a way that you expect to be able to extract information from them; only use them to identify bits of your document.

Majewski Stefan

4:54 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

Am 23.02.2015 13:39, schrieb James Cummings:

...

On 23/02/15 10:22, Majewski Stefan wrote:

...
- don't overuse @xml:id. Only use @xml:id for referencing fragments of a document. The value of @xml:id shall not be parsed into components during document processing

I know I'm probably wrong but because I'm lazy I tend to prefer over-use of @xml:id.

I can't see that this is lazy. Actually, I figure my description was ambiguous. I tried not to say, use only few @xml:id, but don't use @xml:id for other purposes than identifying elements. I think that is an actual recommendation we can make. If there is some semantics within the ID string, this is for the sole purpose of making it easier to handle for a human agent. It should be such that it might be replaced by a random string (honouring the constraints of ID) without breaking anything. Identifying via IDs is very powerful and has all the benefits you described.

...

...
The concept of IDs is generally one of the week spots in the XML world. While for other kinds of references, e.g. xlink:href, mechanisms as @xml:base have been introduced to cleanly resolve references, a similar mechanism for @xml:id and references to IDs are missing as IDs have to be unique beyond their immediate scope.

Isn't this easy to implement in schematron? And should we be doing so for TEI generally? If a data.pointer attribute starts with '#' should we be checking that an @xml:id exists with that id? Or do we already do that?

This is a question of the scope in which we have to look for this @xml:id. As far as I can see, probably I have just not found it, does not define how to on which basis relative URIs are to be resolved.

...

...
the included documents with these values. But then, fragmental approaches where IDs are referenced that are known to resolve only after inclusion are a problem and might, in e.g. tei:ptr/@target only be resolved and replaced with the generated ID if it is known which document the referenced content resides in.

Use case for private URI syntax?

Any teiCorpus that is created by using xi:include for the aggregation of TEI documents from different sources.

...

I'd feel more comfortable moving to a recommendation of xlink if it had received widespread adoption.

that is true, the adoption of xlink is not great. But METS uses it, for example. -- Mag. Stefan Majewski Projektmanager Abteilung Forschung und Entwicklung Österreichische Nationalbibliothek Josefsplatz 1, 1015 Wien Tel.: (+43 1) 534 10-434 E-Mail: stefan.majewski@onb.ac.at Skype: stefan.majewski.onb.ac.at

Raffaele Viglianti

3:58 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

On Mon, Feb 23, 2015 at 5:22 AM, Majewski Stefan <stefan.majewski@onb.ac.at> wrote: While we could argue that this is project-management specific and beyond

...

the scope of the guidelines, this appears not entirely convincing, as one of the aims of using XML in the first place and using TEI in the second, is to create resources that are likely to be re-used.

There isn't one strategy that the TEI can recommend to make one's TEI more likely to be re-used, the guidelines can only warn about the problem and perhaps suggest a strategy. Using mnemonic ids instead of random ones even exacerbates this problem or re-usability: how many TEI files may end up with xml:id="div1-lg1"? Pre-pending a file's unique identifier greatly reduces the problem, but does not solve it. Anyway, it seems to me that generally the majority of the council would prefer to have at least some examples of ID usage on the guidelines. I would still refrain from calling them "recommendations" because of the variety of valid approaches that people are suggesting. But we can still offer two or three suggestions based on style and personal preference rather than actual effectiveness of a strategy, because it's too dependent on extra-textual contingencies.

Martin Holmes

6:03 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

On 15-02-23 06:58 AM, Raffaele Viglianti wrote:

...

On Mon, Feb 23, 2015 at 5:22 AM, Majewski Stefan <stefan.majewski@onb.ac.at> wrote:

While we could argue that this is project-management specific and beyond

...
the scope of the guidelines, this appears not entirely convincing, as one of the aims of using XML in the first place and using TEI in the second, is to create resources that are likely to be re-used.

There isn't one strategy that the TEI can recommend to make one's TEI more likely to be re-used, the guidelines can only warn about the problem and perhaps suggest a strategy. Using mnemonic ids instead of random ones even exacerbates this problem or re-usability: how many TEI files may end up with xml:id="div1-lg1"? Pre-pending a file's unique identifier greatly reduces the problem, but does not solve it.

There should definitely be some discussion of the required scope of uniqueness for any specific project or document. If you have only one document, that's the scope of the uniqueness requirement, and it's enforced by XML rules. If you require uniqueness across a project, then you need to have a structured approach to id creation (such as prepending document ids) and put mechanisms in place to assist in suggesting new ids and constraining uniqueness across the project. If you want globally-unique ids, that's a whole other issue--but are there really any contexts in which this is useful?

...

Anyway, it seems to me that generally the majority of the council would prefer to have at least some examples of ID usage on the guidelines. I would still refrain from calling them "recommendations" because of the variety of valid approaches that people are suggesting. But we can still offer two or three suggestions based on style and personal preference rather than actual effectiveness of a strategy, because it's too dependent on extra-textual contingencies.

Agreed. We should probably look to see if any other XML projects have such recommendations too (DocBook, DITA, etc.). Cheers, Martin

Fabio Ciotti

6:42 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

Sorry if I jump in this interesting discussion late, it is a busy period :( In general I do not think TEI (at least in the Guidelines) should recommend one or another way of creating IDs since there are pros and cons in each of the possible choices. I think this is a typical user or project specific problem, and I believe we should not interfere with user preferences if and when they do not goes against TEI semantics (as for the syntax of IDs). We can of course have an exemplification and even a discussion of possible approaches, but it must be clear that any solution is ok as far as TEI is concerned. The same for the granularity of the IDs. I think that TEI should still follow the regulatory idea of being as far as possible independent from specific tradition, theories and methodologies that shapes the pragmatics (and rethoric) of text encoding. Maybe customization like TEI simple could mandate one syntax or another, since its objective is precisely that of giving a more constrained tag set.

...

...
Anyway, it seems to me that generally the majority of the council would prefer to have at least some examples of ID usage on the guidelines. I would still refrain from calling them "recommendations" because of the variety of valid approaches that people are suggesting. But we can still offer two or three suggestions based on style and personal preference rather than actual effectiveness of a strategy, because it's too dependent on extra-textual contingencies. Agreed.

With the caveat I expressed above, agreed

...

We should probably look to see if any other XML projects have such recommendations too (DocBook, DITA, etc.).

yes, but again we must keep in mind that TEI has quite different rationales, principles and scope than DocBook, and a fortiori DITA. F

Raffaele Viglianti

31 May 31 May

12:47 a.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

I've expanded a bit the Core Elements chapter section 3.10.2 to list some examples of usage for xml:id. Please review for content and grammar, starting from this div to the end of the parent section: https://github.com/raffazizzi/TEI-Guidelines/blob/master/P5/Source/Guideline... I also didn't add anything about check digits, because I admit --and I tried-- I can't understand how they work :) Maybe Peter could add a sentence or two? Thanks! Raff On Mon, Feb 23, 2015 at 12:42 PM, Fabio Ciotti <fabio.ciotti@uniroma2.it> wrote:

...

Sorry if I jump in this interesting discussion late, it is a busy period :(

In general I do not think TEI (at least in the Guidelines) should recommend one or another way of creating IDs since there are pros and cons in each of the possible choices. I think this is a typical user or project specific problem, and I believe we should not interfere with user preferences if and when they do not goes against TEI semantics (as for the syntax of IDs).

We can of course have an exemplification and even a discussion of possible approaches, but it must be clear that any solution is ok as far as TEI is concerned. The same for the granularity of the IDs. I think that TEI should still follow the regulatory idea of being as far as possible independent from specific tradition, theories and methodologies that shapes the pragmatics (and rethoric) of text encoding.

Maybe customization like TEI simple could mandate one syntax or another, since its objective is precisely that of giving a more constrained tag set.

...
...
Anyway, it seems to me that generally the majority of the council would prefer to have at least some examples of ID usage on the guidelines. I would still refrain from calling them "recommendations" because of the variety of valid approaches that people are suggesting. But we can still offer two or three suggestions based on style and personal preference rather than actual effectiveness of a strategy, because it's too dependent on extra-textual contingencies. Agreed.

With the caveat I expressed above, agreed

...
We should probably look to see if any other XML projects have such recommendations too (DocBook, DITA, etc.).

yes, but again we must keep in mind that TEI has quite different rationales, principles and scope than DocBook, and a fortiori DITA.

F -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

Martin Holmes

4:08 a.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

Hi Raff, That was quick! Looks good too. The only comments I have are minor, mainly native-speaker quibbles. In this sentence: ID attributes must be unique within a single document, and ID values must begin with a letter. I think it should be <att>xml:id</att> rather than "ID", shouldn't it? "derived by textual structure" -- I think "derived from the text structure" might read a bit better. "extra-textual contingencies such as project and file management matters" -- I think I'd replace "matters" with "concerns". "...the attribute used, the elements which can bear standard reference identifiers, and the method for constructing standard reference identifiers, should all be declared in the header..." -- I think the comma after the second "identifiers" should go (no comma between subject and predicate in English). The phrase "bit of text" seems a bit informal -- how about "section of the text"? In the description of typed-path and untyped-path identifiers, I think it would be a good idea to point out that such identifiers, being mechanically derived, can -- and in fact, for accuracy, probably should -- be generated algorithmically, even if later they end up being maintained manually through changes to the text. The paragraph at line 3260 seems to be partly a duplicate of the one starting at 3173. "it is left to the encoders to make a decision" -- I would delete "the" to make it generic. "Note that XSLT's function generate-id() only guaranteed identifier unique to the document being processed." -- "guaranteed" should be "guarantees". "The Guidelines only require that <att>xml:id</att> attributes are unique in the document for XML well-formedness. However..." -- I would rephrase: "XML well-formedness requires only that xml:id attributes be unique within a single document. However.." Hope this helps, and hope you're having a safe trip home. Martin On 2015-05-30 06:47 PM, Raffaele Viglianti wrote:

...

I've expanded a bit the Core Elements chapter section 3.10.2 to list some examples of usage for xml:id. Please review for content and grammar, starting from this div to the end of the parent section: https://github.com/raffazizzi/TEI-Guidelines/blob/master/P5/Source/Guideline...

I also didn't add anything about check digits, because I admit --and I tried-- I can't understand how they work :) Maybe Peter could add a sentence or two?

Thanks! Raff

On Mon, Feb 23, 2015 at 12:42 PM, Fabio Ciotti <fabio.ciotti@uniroma2.it> wrote:

...
Sorry if I jump in this interesting discussion late, it is a busy period :(

In general I do not think TEI (at least in the Guidelines) should recommend one or another way of creating IDs since there are pros and cons in each of the possible choices. I think this is a typical user or project specific problem, and I believe we should not interfere with user preferences if and when they do not goes against TEI semantics (as for the syntax of IDs).

We can of course have an exemplification and even a discussion of possible approaches, but it must be clear that any solution is ok as far as TEI is concerned. The same for the granularity of the IDs. I think that TEI should still follow the regulatory idea of being as far as possible independent from specific tradition, theories and methodologies that shapes the pragmatics (and rethoric) of text encoding.

Maybe customization like TEI simple could mandate one syntax or another, since its objective is precisely that of giving a more constrained tag set.

...
...
Anyway, it seems to me that generally the majority of the council would prefer to have at least some examples of ID usage on the guidelines. I would still refrain from calling them "recommendations" because of the variety of valid approaches that people are suggesting. But we can still offer two or three suggestions based on style and personal preference rather than actual effectiveness of a strategy, because it's too dependent on extra-textual contingencies. Agreed.

With the caveat I expressed above, agreed

...
We should probably look to see if any other XML projects have such recommendations too (DocBook, DITA, etc.).

yes, but again we must keep in mind that TEI has quite different rationales, principles and scope than DocBook, and a fortiori DITA.

F -- tei-council mailing list tei-council@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/tei-council

PLEASE NOTE: postings to this list are publicly archived

Peter Stadler

4:27 p.m.

New subject: [tei-council] Recommendations on values of @xml:id (FR #540)

"A check digit is a form of redundancy check used for error detection on identification numbers“ [1] It simply adds one digit to your ID which gets computed from the base ID. So, if your base ID is „HOL01“, the ID with check digit would be e.g. „HOL012“. This is the only one you’ll use thereafter because the check digit can validate your ID string. If you e.g. swap digits and put „HOL021“ anywhere by accident you are able to detect this error. Best Peter [1] https://en.wikipedia.org/wiki/Check_digit

...

Am 30.05.2015 um 18:47 schrieb Raffaele Viglianti <raffaeleviglianti@gmail.com>:

I also didn't add anything about check digits, because I admit --and I tried-- I can't understand how they work :) Maybe Peter could add a sentence or two?

3687

Age (days ago)

3787

Last active (days ago)

List overview

Download

16 comments

7 participants

participants (7)

Fabio Ciotti
Hugh Cayless
James Cummings
Majewski Stefan
Martin Holmes
Peter Stadler
Raffaele Viglianti