Re: [tei-council] Recommendations on values of @xml:id (FR #540)

23 Feb 2015

      Am 20.02.2015 23:42, schrieb Martin Holmes:
...
...
+1. Is it worth having a paragraph in the GLs discussing the issue 
without making recommendations?
I agree with both points: don't recommend, but do discuss briefly a 
range of possible options.
I don't like random ids, because they're extremely difficult to keep 
in mind for any length of time. Semi-meaningful ids (FRED1, LOND47) 
are certainly not useful for sorting or sequencing, but when you need 
to type them into a search or type a few of them, they're much easier 
to deal with.
I think, when soliciting the use of IDs, we should also give some 
recommendations on how tu use IDs. If even only to suggest that a xml:id 
pattern is useful and should tailored to the particular needs of a 
project. I would list the following suggestions:

- if a project uses multiple XML sources that are aggregated at some 
point via mechanisms such as XInclude, @xml:id values must be unique 
with the resulting document. One way to achieve this is to prepend a 
document specific identifier to each @xml:id in the document.

- if human encoders are using the @xml:id values in their encoding I 
recommend to use a mnemonic identifier that is likely not to conflict 
with another ID within the scope and context of the document. For 
example, when writing born digital documents and referencing 
<biblStructs> from within the text to establish bibliographic relations 
@xml:id values as Bird2001 or Leech1972 can be a good idea.

- don't overuse @xml:id. Only use @xml:id for referencing fragments of a 
document. The value of @xml:id shall not be parsed into components 
during document processing and not used as a basis to decide on a 
particular rendition or processing. Don't use @xml:id for sorting, use 
content bearing attributes or elements for this.

The concept of IDs is generally one of the week spots in the XML world. 
While for other kinds of references, e.g. xlink:href, mechanisms as 
@xml:base have been introduced to cleanly resolve references, a similar 
mechanism for @xml:id and references to IDs are missing as IDs have to 
be unique beyond their immediate scope. While we could argue that this 
is project-management specific and beyond the scope of the guidelines, 
this appears not entirely convincing, as one of the aims of using XML in 
the first place and using TEI in the second, is to create resources that 
are likely to be re-used. This could be tackled by requiring the 
software that processes these IDs to replaced by unique  generated ID 
values on inclusion and all references within the included documents 
with these values. But then, fragmental approaches where IDs are 
referenced that are known to resolve only after inclusion are a problem 
and might, in e.g. tei:ptr/@target only be resolved and replaced with 
the generated ID if it is known which document the referenced content 
resides in. Hence @target attributes like bibliography.xml#Bird2001 
would here be beneficial, as these could be easily be replaced by the 
mechanisms analogous to what we find in xlink [1], especially regarding 
resolving against @xml:base [2][3]. Maybe this would be something to 
consider when defining P6 at some point to either move to xlink (maybe 
simplelink) or use the XMLBase recommendation [2].

[1] http://www.w3.org/TR/xlink/
[2] http://www.w3.org/TR/2001/REC-xmlbase-20010627/
[2] http://www.w3.org/TR/xlink/#link-locators

-- 
Mag. Stefan Majewski
Projektmanager
Abteilung Forschung und Entwicklung
Österreichische Nationalbibliothek

Josefsplatz 1, 1015 Wien
Tel.: (+43 1) 534 10-434
E-Mail: stefan.majewski@onb.ac.at
Skype: stefan.majewski.onb.ac.at

Re: [tei-council] Recommendations on values of @xml:id (FR #540)

Majewski Stefan