Dear Dominik,
I hope we can move on. If you have read a bit of the two articles which I
passed round you would be aware that the JP project has grown
organicalically out of the Jaina Onomasticon project and is largely a
continuation of it with other means. It definitely has not been set up as
competition to any other project. Consultation on the best DH approach was
extensive and started in earnest four years ago. Both the application.and
again the articles address the issue why then existing databases and TEI
approaches were not able to help the project whatever happens in future.
Both Karin Preisendanz (Lahore project) and Yigal Bronner are advisers to
the project and well aware of all the issues. Etc.
There will always and should be multiple projects. Mere duplication of
course would be a waste of resources and is not funded.
I understand Andrew that this forum ts intended to be potentially useful
for discussing the problems of finetuning and linking a variety of
different projects in DH pertaining to South Asian Studies. I certainly
would like to know more about how the numerous TEI projects do or plan to
link data in future if at all.
Data mining may be a promising route.
Klatt excerpted the contents of his sources and these were the basis of his
Onomasticon (we had to reconstruct the bibliography). This time consuming
process is at the heart of any prosopographical project and here
duplication should indeed be avoided. The task is so enormous that I don't
think competition will an issue for decades to come.
I certainly don't think in this way at all and have explored for several
years possibilities of exploring ways of linking Jaina datasets in India to
the Klatt data for mutual benefit (fingers crossed) and I am excited to
learn about Peter Scharf's work.
with best wishes
Peter
On 30 May 2018 at 22:24, Dominik Wujastyk
TEI does include a mechanism for pointing to external data entities in complex ways (Guidelines chapter 16); it can be done, although this is clumsy and not the answer one would wish for in a large project.
Also, TEI has an extraordinary depth of documentary awareness that nobody with serious scholarly engagement would want to relinquish. Just one example, Guidelines chapter 21. The ability to express degrees of certainty is central to the scholarly endeavour.
The basic idea of the data triple doesn't - as far as I can see - provide anything like the granularity that one would look for in a set of relations. Everything is linked by "is" as if that were an unproblematic or universal form predication. (I hasten to say that I don't understand data triples and semantic ontologies very well, and I could well be wrong about what can be done.)
The tension between TEI and semantically linked data - or in my old-fashioned language, between documents and databases - is very much a current discussion in the TEI world. See, e.g., https://journals.openedition.org/jtei/1191?lang=en, https://journals.openedition.org/jtei/1480#tocto1n2, https://hcmc.uvic.ca/tei2017/abstracts/t_141_ore_ ontologiesconceptualmodels.html, http://www.1890s.ca/PDFs/ Crossing%20the%20Stile.pdf, and much more.
My view may be summarized as "jam today." The Pandit project already exists and is already rather wonderful. It has achieved a critical mass of data that makes it already a discovery tool. Today. It has received a lot of expert curation and funding over many years. It is not to be ignored or discarded. Data entered into Pandit will not be lost in future. Yes, Pandit needs to grow in important new directions. Among the most important is that it should develop transparent import-export mechanisms. And - yes - it needs to be able to write out data in a form that maintains the semantic ontology that it embodies, and that can be used by others. But it seems inexplicable to me to ignore Pandit and start a prosopographical project in competition with it.
Best, Dominik
On Mon, 28 May 2018 at 10:24, Andrew Ollett
wrote: Dear list members,
I have been following both the Jaina-Prosopography and PANDiT projects with great interest and optimism. There are two general questions that have arisen which I think need to be separated: the possibility of sharing data between projects, and the use of TEI as a data format. There is luckily no disagreement over the fact that data should be published in a free and accessible way. But the really essential thing is not just to publish the data, but to publish it in a format that can be queried and retrieved programmatically. This is precisely what "Linked Open Data" is supposed to do, and there has been a huge amount of work in neighboring fields like Classics to build resources that are linked and open in precisely this way. For example, this http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/ collection of papers (now a bit dated) about "current practice in linked open data for the ancient world" and the SNAP:DRGN http://snapdrgn.net project (also a bit dated). I think Gabriel Bodard is consulting on the Jaina-Prosopography project and from Peter's description the data will be published in accordance with LOD standards. Making the data available in other formats, such as CSV or TEI, is a nice gesture, and may be useful for certain users, but because CSV and TEI documents are just documents, and there aren't tools for extracting relations from huge amounts of CSV or TEI data (well, there probably are for CSV...), they are about as useful as plain text files.
One of the great benefits of the LOD approach is that projects can share data despite having different data models. In order for one project to use another's data, there will inevitably be some work of mapping the ontology of one onto the ontology of another (something that PANDiT has dealt with over successive imports of data from other sources). But we are not in the situation we were in previously, where the data of one project is essentially useless to another without a massive investment of time and money.
Now to come back to TEI: there are projects that use TEI as the basic data format for prosopographic data, such as Syriaca.org. But TEI is meant to encode text data, and it is not particularly good at representing relations between entities in the kind of well-defined ontologies that prosopographic databases need. Syriaca has essentially had to define their ontology, and controlled vocabularies, and then find ways of representing those vocabularies in TEI. There's a lot that can go wrong there. Neither of the databases we're talking about uses TEI as its basic data format, for good reasons. We might want them to publish their data as TEI, as an exchange format, but as I noted above, it's not clear what any of us would do with (in the case of PANDiT) 50,848 TEI documents.
prayojanam anuddiśya na mando ’pi pravartate. What is it exactly that we want from the published data? What do we want to do with it? How do we want to share it, query it, connect it? We now have these amazing resources, and we should try to use them often, and use them creatively. I think that LOD standards would help in a lot of respects (e.g., being able to get relevant biodata for a given person just from the PANDiT ID and put it on a website programmatically) but I am very curious about what specific purposes would be served by publishing the data in TEI format.
Andrew _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
-- Dr Peter Flügel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies