Re: [Indic-texts] Jaina-Prosopography

31 May 2018

      Dear Dominik,

I hope we can move on. If you have read a bit of the two articles which I
passed round you would be aware that the JP project has grown
organicalically out of the Jaina Onomasticon project and is largely a
continuation of it with other means. It definitely has not been set up as
competition to any other project. Consultation on the best DH approach was
extensive and started in earnest four years ago. Both the application.and
again the articles address the issue why then existing databases and TEI
approaches were not able to help the project whatever happens in future.
Both Karin Preisendanz (Lahore project) and Yigal Bronner are advisers to
the project and well aware of all the issues. Etc.

There will always and should be multiple projects. Mere duplication of
course would be a waste of resources and is not funded.

I understand Andrew that this forum ts intended to be potentially useful
for discussing the problems of finetuning and linking a variety of
different projects in DH pertaining to South Asian Studies. I certainly
would like to know more about how the numerous TEI projects do or plan to
link data in future if at all.

Data mining may be a promising route.

Klatt excerpted the contents of his sources and these were the basis of his
Onomasticon (we had to reconstruct the bibliography). This time consuming
process is at the heart of any prosopographical project and here
duplication should indeed be avoided. The task is so enormous that I don't
think competition will an issue for decades to come.

I certainly don't think in this way at all and have explored for several
years possibilities of exploring ways of linking Jaina datasets in India to
the Klatt data for mutual benefit (fingers crossed) and I am excited to
learn about Peter Scharf's work.

with best wishes

Peter

On 30 May 2018 at 22:24, Dominik Wujastyk <wujastyk@gmail.com> wrote:
...
TEI does include a mechanism for pointing to external data entities in
complex ways (Guidelines chapter 16); it can be done, although this is
clumsy and not the answer one would wish for in a large project.
Also, TEI has an extraordinary depth of documentary awareness that nobody
with serious scholarly engagement would want to relinquish.  Just one
example, Guidelines chapter 21.  The ability to express degrees of
certainty is central to the scholarly endeavour.
The basic idea of the data triple doesn't - as far as I can see - provide
anything like the granularity that one would look for in a set of
relations.  Everything is linked by "is" as if that were an unproblematic
or universal form predication.  (I hasten to say that I don't understand
data triples and semantic ontologies very well, and I could well be wrong
about what can be done.)
The tension between TEI and semantically linked data - or in my
old-fashioned language, between documents and databases - is very much a
current discussion in the TEI world.  See, e.g.,
https://journals.openedition.org/jtei/1191?lang=en,
https://journals.openedition.org/jtei/1480#tocto1n2,
https://hcmc.uvic.ca/tei2017/abstracts/t_141_ore_
ontologiesconceptualmodels.html, http://www.1890s.ca/PDFs/
Crossing%20the%20Stile.pdf, and much more.
My view may be summarized as "jam today."  The Pandit project already
exists and is already rather wonderful.  It has achieved a critical mass of
data that makes it already a discovery tool.  Today.  It has received a lot
of expert curation and funding over many years.  It is not to be ignored or
discarded.  Data entered into Pandit will not be lost in future.  Yes,
Pandit needs to grow in important new directions.  Among the most important
is that it should develop transparent import-export mechanisms.  And - yes
- it needs to be able to write out data in a form that maintains the
semantic ontology that it embodies, and that can be used by others.  But it
seems inexplicable to me to ignore Pandit and start a prosopographical
project in competition with it.
Best,
Dominik
On Mon, 28 May 2018 at 10:24, Andrew Ollett <andrew.ollett@gmail.com>
wrote:
...
Dear list members,
I have been following both the Jaina-Prosopography and PANDiT projects
with great interest and optimism. There are two general questions that have
arisen which I think need to be separated: the possibility of sharing data
between projects, and the use of TEI as a data format. There is luckily no
disagreement over the fact that data should be published in a free and
accessible way. But the really essential thing is not just to publish the
data, but to publish it in a format that can be queried and retrieved
programmatically. This is precisely what "Linked Open Data" is supposed to
do, and there has been a huge amount of work in neighboring fields like
Classics to build resources that are linked and open in precisely this way.
For example, this <http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/>
collection of papers (now a bit dated) about "current practice in linked
open data for the ancient world" and the SNAP:DRGN <http://snapdrgn.net>
project (also a bit dated). I think Gabriel Bodard is consulting on the
Jaina-Prosopography project and from Peter's description the data will be
published in accordance with LOD standards. Making the data available in
other formats, such as CSV or TEI, is a nice gesture, and may be useful for
certain users, but because CSV and TEI documents are just documents, and
there aren't tools for extracting relations from huge amounts of CSV or TEI
data (well, there probably are for CSV...), they are about as useful as
plain text files.
One of the great benefits of the LOD approach is that projects can share
data despite having different data models. In order for one project to use
another's data, there will inevitably be some work of mapping the ontology
of one onto the ontology of another (something that PANDiT has dealt with
over successive imports of data from other sources). But we are not in the
situation we were in previously, where the data of one project is
essentially useless to another without a massive investment of time and
money.
Now to come back to TEI: there are projects that use TEI as the basic
data format for prosopographic data, such as Syriaca.org. But TEI is
meant to encode text data, and it is not particularly good at representing
relations between entities in the kind of well-defined ontologies that
prosopographic databases need. Syriaca has essentially had to define their
ontology, and controlled vocabularies, and then find ways of representing
those vocabularies in TEI. There's a lot that can go wrong there. Neither
of the databases we're talking about uses TEI as its basic data format, for
good reasons. We might want them to publish their data as TEI, as an
exchange format, but as I noted above, it's not clear what any of us would
do with (in the case of PANDiT) 50,848 TEI documents.
prayojanam anuddiśya na mando ’pi pravartate. What is it exactly that we
want from the published data? What do we want to do with it? How do we want
to share it, query it, connect it? We now have these amazing resources, and
we should try to use them often, and use them creatively. I think that LOD
standards would help in a lot of respects (e.g., being able to get relevant
biodata for a given person just from the PANDiT ID and put it on a website
programmatically) but I am very curious about what specific purposes would
be served by publishing the data in TEI format.
Andrew
_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________
indic-texts mailing list
indic-texts@lists.tei-c.org
http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
-- 
Dr Peter Flügel
Chair, Centre of Jaina Studies
Department of History, Religions and Philosophies
School of Oriental and African Studies
University of London
Thornhaugh Street
Russell Square
London WC1H OXG

Tel.: (+44-20) 7898 4776
E-mail: pf8@soas.ac.uk
http://www.soas.ac.uk/jainastudies

Re: [Indic-texts] Jaina-Prosopography

Peter Flugel