On Fri, Feb 14 2020, Peter Scharf wrote:
Patrick, Would it be possible to build a search interface that searched the TEI source and restricted that search to the content of desired elements such as lg and s elements? This rather than extracting to a text document and searching that? The Oxygen search interface permits XML aware searching to differentiate between searching the content of elements, versus attribute values, versus element names, etc. It might be similarly possible to differentiate which elements to search.
Dear Peter, yes, that is possible. But I don’t think it would be very useful to stick too closely to the TEI markup. Someone will have to decide which text (or, “content”) is relevant. Consider these cases: ┌──── │ <lg xmlns="tei"> │ <l>A B C</l> │ </lg> └──── ┌──── │ <lg xmlns="tei"> │ <l>A <hi>B</hi> C</l> │ </lg> └──── ┌──── │ <lg xmlns="tei"> │ <l>A <del>B</del> C</l> │ </lg> └──── ┌──── │ <lg xmlns="tei"> │ <l>A <note xml:lang="en">B</note> C</l> │ </lg> └──── What would you expect a search for “A AND B” to return? And in which order? What is the “content” of the tei:lg elements? It gets quite tricky to think through all the variations, and for a group of texts with markup as diverse as what you find in SARIT’s library you’ll run into contradictions (forcing you to define rules per document). So it’s easiest to give “abstract” classes, metric texts vs. prose, for example, which can be extracted from your markup and exposed through a simple search interface that doesn’t presuppose acquaintance with the TEI. Technically, it’s no big problem to store tag information alongside the strings: you can easily search for “tag:lg *harmy*” at https://es.rdorte.org/_plugin/calaca/. This is somewhere between XML-aware search and full text search. But the utility is rather limited, especially if you get users who don’t know what tei:lg or tei:p (or an XML element in general) is supposed to be. Best wishes,
Yours, Peter
****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org https://sanskritlibrary.org ******************************
On 14 Feb 2020, at 4:27 AM, Patrick McAllister
wrote: Dear list members,
just to add a more general point to what’s been said already: you’ll
-- Patrick McAllister long-term email: pma@rdorte.org