Dear Friends, I have just joined the list on recommendation. I haven't seen any of the earlier communications as yet, but as an opener would like to inform you of a the above mentioned project and of the attached probing articles that emerged from in its context: (2018) 'Jaina-Prosopography I: Sociology of Jaina-Names.' In: Balbir, Nalini and Flügel, Peter, (eds.), *Jaina Studies. Select Papers Presented in the 'Jaina Studies' Section at the 16th World Sanskrit Conference.* Delhi: Rashtriya Sanskrit Sansthan & D.K. Publishers & Distributors, pp. 187-267. (Proceedings of the World Sanskrit Conference) https://eprints.soas.ac.uk/24708/ (2018) 'Jaina-Prosopography II: “Patronage” in Jaina Epigraphic and Manuscript Catalogues.' In: Chojnacki, Christine and Leclère, Basile, (eds.), *Gift of Knowledge: Patterns of Patronage in Jainism.*Bangalore: National Institute of Prakrit Research Shravanabelagola, pp. 1-46. (in press) On the project: https://www.soas.ac.uk/jaina-prosopography/ Since the main purpose of the project is to create a tool to be used by the research community beyond the lifetime of the project any input leading to an improvement of the approach would be more than welcome. The database will be accessible by the beginning of 2020. with best wishes Peter -- Dr Peter Flügel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies
Dear Peter,
Can I suggest in the strongest terms that the Jaina prosopography project
uses the Pandit http://panditproject.org prosopography system for
managing its data? Many of us are interested in the same kinds of question,
and it is critically important that we share data. If we all go off in
corners and re-invent the wheel, we will set back our field by decades.
Pandit isn't perfect. It may not even meet all your needs, although I
suspect you will be surprised by how much it does do for a project such as
yours. However, Yigal and the team who created and maintain Pandit are
actively developing it and they are very open to technical suggestions (and
money!). I am in such a dialogue myself right now, with one of their
system programmers who lives in Brazil.
It is easy to underestimate the difficulty of creating a sound tool for
serious prosopographical work. Pandit is itself the third iteration of the
project. It started as a Windows-based Filemaker Pro project. That became
unusable after a certain volume of data was added. Then it was ported to
MySQL running on a Linux base. That was more robust, but also had
drawbacks due to leftover features from the earlier Windows mess. Finally,
Yigal had the whole thing rewritten again in the light of everything we had
learned and ported the very considerable volume of data forward to the new
system. Since then, even more author/work/manuscript data and
bibliographical material has been added.
Looking at your Jaina prosopography paper, all the "methodological issues"
you raise in sections 3 and the onomastic issues in section 4 were faced
and mostly solved by the Philobiblon
http://bancroft.berkeley.edu/philobiblon/ project, many years ago, in the
context of Iberian prosopography. A major part of that thinking informs
Pandit.
The basic fact is that systems such as this become exponentially more
useful as more data is added. What you add links to what has been added
before. New relationships are discovered, time is saved by not entering
the same data repeatedly and this, in turn, leads to a major gain in
accuracy. (A public genealogical system that demonstrates this kind of
crowdsourced cooperation-gain superbly is geni.com.)
If you decide not to put your project's data in Pandit, then I hope you
will produce a compelling document explaining why not. It will be very
useful to the Pandit team for their own future consideration.
Best,
Dominik
--
Professor Dominik Wujastyk http://ualberta.academia.edu/DominikWujastyk
,
Singhmar Chair in Classical Indian Society and Polity
,
Department of History and Classics http://historyandclassics.ualberta.ca/
,
University of Alberta, Canada
.
South Asia at the U of A:
sas.ualberta.ca
On 16 May 2018 at 03:19, Peter Flugel
Dear Friends,
I have just joined the list on recommendation. I haven't seen any of the earlier communications as yet, but as an opener would like to inform you of a the above mentioned project and of the attached probing articles that emerged from in its context:
(2018) 'Jaina-Prosopography I: Sociology of Jaina-Names.' In: Balbir, Nalini and Flügel, Peter, (eds.), *Jaina Studies. Select Papers Presented in the 'Jaina Studies' Section at the 16th World Sanskrit Conference.* Delhi: Rashtriya Sanskrit Sansthan & D.K. Publishers & Distributors, pp. 187-267. (Proceedings of the World Sanskrit Conference) https://eprints. soas.ac.uk/24708/
(2018) 'Jaina-Prosopography II: “Patronage” in Jaina Epigraphic and Manuscript Catalogues.' In: Chojnacki, Christine and Leclère, Basile, (eds.), *Gift of Knowledge: Patterns of Patronage in Jainism.*Bangalore: National Institute of Prakrit Research Shravanabelagola, pp. 1-46. (in press)
On the project: https://www.soas.ac.uk/jaina-prosopography/
Since the main purpose of the project is to create a tool to be used by the research community beyond the lifetime of the project any input leading to an improvement of the approach would be more than welcome.
The database will be accessible by the beginning of 2020.
with best wishes
Peter
-- Dr Peter Flügel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG
Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
Dear Dominik,
We discussed the issues privately a year ago and we took many of your
extremely useful suggestions on board.
As highlighted in the articles, the triplestore data of the JP will be made
available free to use by anyone once the download functions and webpage are
installed. I wish this was the case all around especially in India where
Jaina electronic datasets are as difficult to access as the Jaisalmer
bhandar.
The most interesting answer I received so far to the standard question as
to the long term viability of databases (for the JP guaranteed by Sheffield
DHI) was the one of GabrieI Bodard at Senate House. I think he is right in
saying that the only procedure that preserved texts in the past and will
preserve electronic data over time is copying. No super database or
web-portal will ever emerge and be able to survive over time (maybe apart
from the internet itself). Building in options for linking purpose built
datasets at the point of development therefore seems to be the way to go.
The data mining software Gabriel developed is limited to the standard
categories of name, date, place etc. 'Stage two' prosopographies' however
require uniform data and often highly differentiated taxononomies for
specific purposes and require a great deal of case specific analysis.
Given permissions (!) it may be possible relatively easily to produce a
dataset of datasets through copying. But the emerging datasets will not be
uniform and hence require massive work to be useful for stage two
prosopographies. I can presently not see how this can be handled without
hands on editorship. Since eternal editor roles are inconceivable some kind
of standard will have to emerge.
Solving the conundrum of finding a standard for coding Indian family names
for library catalogues proved tricky and has been abandoned as far as I can
see. In the articles I laid out the practical and theoretical issues we
faced as a memento of self-reflection but also in view of informing
(potential) collaborators on this specific project. I found Keats-Rohan's
book by far the most useful source for practical research with such tools.
She did not touch upon South Asian materials though.
On the addition problem how to operationalise patronage I didn't find much
useful information but I may have missed something. TEI has no useful
categories. Off the shelf does not always work for particular research
questions as well.
Clearly at least in South Asian Studies most work still lies ahead. We need
datasets comparable to the projects on the Roman materials of Mommsen
etc. (which are still not digitally available, certainly not in state two
prosopographical format). Pandit is fabulous but did not provide the tools
for instigating the research questions informing the JP project. Otherwise
I would have indeed simply requested Yigal to permit putting data in and
computing them as suggested by you. I am not aware of a sociologically
oriented stage-two prosopography in South Asian studies at present and
cannot be sure whether in a limited way the JP will succeed realising this
aim. It also is a pilot study. But its data can be copied ad we learn quite
a lot about the structure of the Jaina data in the process of analysis.
Sigh...
Peter
On 16 May 2018 at 18:54, Dominik Wujastyk
Dear Peter,
Can I suggest in the strongest terms that the Jaina prosopography project uses the Pandit http://panditproject.org prosopography system for managing its data? Many of us are interested in the same kinds of question, and it is critically important that we share data. If we all go off in corners and re-invent the wheel, we will set back our field by decades.
Pandit isn't perfect. It may not even meet all your needs, although I suspect you will be surprised by how much it does do for a project such as yours. However, Yigal and the team who created and maintain Pandit are actively developing it and they are very open to technical suggestions (and money!). I am in such a dialogue myself right now, with one of their system programmers who lives in Brazil.
It is easy to underestimate the difficulty of creating a sound tool for serious prosopographical work. Pandit is itself the third iteration of the project. It started as a Windows-based Filemaker Pro project. That became unusable after a certain volume of data was added. Then it was ported to MySQL running on a Linux base. That was more robust, but also had drawbacks due to leftover features from the earlier Windows mess. Finally, Yigal had the whole thing rewritten again in the light of everything we had learned and ported the very considerable volume of data forward to the new system. Since then, even more author/work/manuscript data and bibliographical material has been added.
Looking at your Jaina prosopography paper, all the "methodological issues" you raise in sections 3 and the onomastic issues in section 4 were faced and mostly solved by the Philobiblon http://bancroft.berkeley.edu/philobiblon/ project, many years ago, in the context of Iberian prosopography. A major part of that thinking informs Pandit.
The basic fact is that systems such as this become exponentially more useful as more data is added. What you add links to what has been added before. New relationships are discovered, time is saved by not entering the same data repeatedly and this, in turn, leads to a major gain in accuracy. (A public genealogical system that demonstrates this kind of crowdsourced cooperation-gain superbly is geni.com.)
If you decide not to put your project's data in Pandit, then I hope you will produce a compelling document explaining why not. It will be very useful to the Pandit team for their own future consideration.
Best, Dominik
-- Professor Dominik Wujastyk http://ualberta.academia.edu/DominikWujastyk ,
Singhmar Chair in Classical Indian Society and Polity ,
Department of History and Classics http://historyandclassics.ualberta.ca/ , University of Alberta, Canada .
South Asia at the U of A:
sas.ualberta.ca
On 16 May 2018 at 03:19, Peter Flugel
wrote: Dear Friends,
I have just joined the list on recommendation. I haven't seen any of the earlier communications as yet, but as an opener would like to inform you of a the above mentioned project and of the attached probing articles that emerged from in its context:
(2018) 'Jaina-Prosopography I: Sociology of Jaina-Names.' In: Balbir, Nalini and Flügel, Peter, (eds.), *Jaina Studies. Select Papers Presented in the 'Jaina Studies' Section at the 16th World Sanskrit Conference.* Delhi: Rashtriya Sanskrit Sansthan & D.K. Publishers & Distributors, pp. 187-267. (Proceedings of the World Sanskrit Conference) https://eprints.soas.ac.uk/24708/
(2018) 'Jaina-Prosopography II: “Patronage” in Jaina Epigraphic and Manuscript Catalogues.' In: Chojnacki, Christine and Leclère, Basile, (eds.), *Gift of Knowledge: Patterns of Patronage in Jainism.*Bangalore: National Institute of Prakrit Research Shravanabelagola, pp. 1-46. (in press)
On the project: https://www.soas.ac.uk/jaina-prosopography/
Since the main purpose of the project is to create a tool to be used by the research community beyond the lifetime of the project any input leading to an improvement of the approach would be more than welcome.
The database will be accessible by the beginning of 2020.
with best wishes
Peter
-- Dr Peter Flügel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG
Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
-- Dr Peter Flügel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies
Dear Peter, yes, I remember our conversation. I regret that other voices
were louder for you then, and that they have prevailed now.
You could still use Pandit and work with Yigal and his team to get it
exporting triples. That would be fantastic for everyone, a real benefit
for the whole community working on Indian prosopography.
Your remark about TEI tells me that you haven't really looked into it. The
TEI is not an off-the-shelf set of tags to rummage through and be
disappointed because it doesn't define *gaccha*. It is a framework for
*defining* XML schemas. IYes, it provides a lot of standard stuff already,
but the real point of TEI is that it is extensible. You use it to define
the special tags that you actually need for your project. For SARIT, for
example, we define many things like "adhyāya" "vyākhyā" and so forth, as
subdivisions of a generic "part," because that's useful in expressing how
Sanskrit texts are structured.
I'm concerned, Peter, that by starting from zero with a highly complex
project you will pour effort into it and find in five or ten years that it
all has to be thrown away. That is a very common experience in projects of
this kind. I apologize for speaking so plainly, but my concern is born
of hard experience. Witness the Woolner project, and many others.
Best,
Dominik
--
Professor Dominik Wujastyk http://ualberta.academia.edu/DominikWujastyk
,
Singhmar Chair in Classical Indian Society and Polity
,
Department of History and Classics http://historyandclassics.ualberta.ca/
,
University of Alberta, Canada
.
South Asia at the U of A:
sas.ualberta.ca
On 17 May 2018 at 02:01, Peter Flugel
Dear Dominik,
We discussed the issues privately a year ago and we took many of your extremely useful suggestions on board.
As highlighted in the articles, the triplestore data of the JP will be made available free to use by anyone once the download functions and webpage are installed. I wish this was the case all around especially in India where Jaina electronic datasets are as difficult to access as the Jaisalmer bhandar.
The most interesting answer I received so far to the standard question as to the long term viability of databases (for the JP guaranteed by Sheffield DHI) was the one of GabrieI Bodard at Senate House. I think he is right in saying that the only procedure that preserved texts in the past and will preserve electronic data over time is copying. No super database or web-portal will ever emerge and be able to survive over time (maybe apart from the internet itself). Building in options for linking purpose built datasets at the point of development therefore seems to be the way to go.
The data mining software Gabriel developed is limited to the standard categories of name, date, place etc. 'Stage two' prosopographies' however require uniform data and often highly differentiated taxononomies for specific purposes and require a great deal of case specific analysis.
Given permissions (!) it may be possible relatively easily to produce a dataset of datasets through copying. But the emerging datasets will not be uniform and hence require massive work to be useful for stage two prosopographies. I can presently not see how this can be handled without hands on editorship. Since eternal editor roles are inconceivable some kind of standard will have to emerge.
Solving the conundrum of finding a standard for coding Indian family names for library catalogues proved tricky and has been abandoned as far as I can see. In the articles I laid out the practical and theoretical issues we faced as a memento of self-reflection but also in view of informing (potential) collaborators on this specific project. I found Keats-Rohan's book by far the most useful source for practical research with such tools. She did not touch upon South Asian materials though.
On the addition problem how to operationalise patronage I didn't find much useful information but I may have missed something. TEI has no useful categories. Off the shelf does not always work for particular research questions as well.
Clearly at least in South Asian Studies most work still lies ahead. We need datasets comparable to the projects on the Roman materials of Mommsen etc. (which are still not digitally available, certainly not in state two prosopographical format). Pandit is fabulous but did not provide the tools for instigating the research questions informing the JP project. Otherwise I would have indeed simply requested Yigal to permit putting data in and computing them as suggested by you. I am not aware of a sociologically oriented stage-two prosopography in South Asian studies at present and cannot be sure whether in a limited way the JP will succeed realising this aim. It also is a pilot study. But its data can be copied ad we learn quite a lot about the structure of the Jaina data in the process of analysis.
Sigh...
Peter
On 16 May 2018 at 18:54, Dominik Wujastyk
wrote: Dear Peter,
Can I suggest in the strongest terms that the Jaina prosopography project uses the Pandit http://panditproject.org prosopography system for managing its data? Many of us are interested in the same kinds of question, and it is critically important that we share data. If we all go off in corners and re-invent the wheel, we will set back our field by decades.
Pandit isn't perfect. It may not even meet all your needs, although I suspect you will be surprised by how much it does do for a project such as yours. However, Yigal and the team who created and maintain Pandit are actively developing it and they are very open to technical suggestions (and money!). I am in such a dialogue myself right now, with one of their system programmers who lives in Brazil.
It is easy to underestimate the difficulty of creating a sound tool for serious prosopographical work. Pandit is itself the third iteration of the project. It started as a Windows-based Filemaker Pro project. That became unusable after a certain volume of data was added. Then it was ported to MySQL running on a Linux base. That was more robust, but also had drawbacks due to leftover features from the earlier Windows mess. Finally, Yigal had the whole thing rewritten again in the light of everything we had learned and ported the very considerable volume of data forward to the new system. Since then, even more author/work/manuscript data and bibliographical material has been added.
Looking at your Jaina prosopography paper, all the "methodological issues" you raise in sections 3 and the onomastic issues in section 4 were faced and mostly solved by the Philobiblon http://bancroft.berkeley.edu/philobiblon/ project, many years ago, in the context of Iberian prosopography. A major part of that thinking informs Pandit.
The basic fact is that systems such as this become exponentially more useful as more data is added. What you add links to what has been added before. New relationships are discovered, time is saved by not entering the same data repeatedly and this, in turn, leads to a major gain in accuracy. (A public genealogical system that demonstrates this kind of crowdsourced cooperation-gain superbly is geni.com.)
If you decide not to put your project's data in Pandit, then I hope you will produce a compelling document explaining why not. It will be very useful to the Pandit team for their own future consideration.
Best, Dominik
-- Professor Dominik Wujastyk http://ualberta.academia.edu/DominikWujastyk ,
Singhmar Chair in Classical Indian Society and Polity ,
Department of History and Classics http://historyandclassics.ualberta.ca/ , University of Alberta, Canada .
South Asia at the U of A:
sas.ualberta.ca
On 16 May 2018 at 03:19, Peter Flugel
wrote: Dear Friends,
I have just joined the list on recommendation. I haven't seen any of the earlier communications as yet, but as an opener would like to inform you of a the above mentioned project and of the attached probing articles that emerged from in its context:
(2018) 'Jaina-Prosopography I: Sociology of Jaina-Names.' In: Balbir, Nalini and Flügel, Peter, (eds.), *Jaina Studies. Select Papers Presented in the 'Jaina Studies' Section at the 16th World Sanskrit Conference.* Delhi: Rashtriya Sanskrit Sansthan & D.K. Publishers & Distributors, pp. 187-267. (Proceedings of the World Sanskrit Conference) https://eprints.soas.ac.uk/24708/
(2018) 'Jaina-Prosopography II: “Patronage” in Jaina Epigraphic and Manuscript Catalogues.' In: Chojnacki, Christine and Leclère, Basile, (eds.), *Gift of Knowledge: Patterns of Patronage in Jainism.*Bangalore: National Institute of Prakrit Research Shravanabelagola, pp. 1-46. (in press)
On the project: https://www.soas.ac.uk/jaina-prosopography/
Since the main purpose of the project is to create a tool to be used by the research community beyond the lifetime of the project any input leading to an improvement of the approach would be more than welcome.
The database will be accessible by the beginning of 2020.
with best wishes
Peter
-- Dr Peter Flügel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG
Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
-- Dr Peter Flügel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG
Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies
Dear friends, I apologize for only now joining this list and this important conversation. Let me add a few quick notes to this thread, and say that I am happy to respond in more detail to any comments in the coming days. Peter: I very much agree with Dominik that it is better not to start from Zero, when panditproject.org http://panditproject.org/ is available and ready to receive data about texts, people, places, manuscripts, etc., and many of their complex interrelations https://www.panditproject.org/entity/97199/info. Pandit is constantly updated and improved (I see that five new entities were entered just yesterday, to give a random example), and new features https://www.panditproject.org/articles are added at a rather fast rate. For instance, we just introduced our first export button (in CSV, more are in the planning process), we are working on a large revision of our fields for textual reuse, we just revamped our fields for manuscript extracts (in connection with our import of data for a thousand manuscripts from the BORI descriptive catalog), and so on. The big advantage of combining data sets is, of course, that every new item is immediately related to the larger matrix. This matrix is, in the end, our best research tool, even as I fully agree that new tools need to be created to make better use of it. The basic matrix, and over 50,000 related entities, are already in place. Another advantage is the fact that there is already a small but active community working on Pandit in different parts of the globe (and by the way: Pandit does not belong to any person or to any institution—it belongs to the community). I also agree that Pandit is far from perfect, and that there is, indeed, much to improve. In fact, there is so much more to do, and it would be certainly easier if we did so together. We at Pandit are very open to concrete suggestions on every level, so if you’d like, you can write me with a wish-list of top-priority features, and we can see how we can implement them. That said, I also realize that everyone would like to work with Pandit, or at least not solely with Pandit. So I suggest that if you do create your own database, we should look at ways for integrating them, or protocols that will allow items entered into one platform to be also added to the other. At any rate, I recommend that you create an account and test the system firsthand, by adding or editing a few entities. It has come a long way since we have last been in touch. Happy to hear your concrete suggestions and criticism! Yigal Yigal Bronner Associate Professor Department of Asian Studies The Hebrew University of Jerusalem http://yigalbronner.huji.ac.il/ http://yigalbronner.huji.ac.il/
On May 17, 2018, at 5:03 PM, Dominik Wujastyk
wrote: --===============0923470341771615001== Content-Type: multipart/alternative; boundary="000000000000d2fb15056c674ea9"
--000000000000d2fb15056c674ea9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Dear Peter, yes, I remember our conversation. I regret that other voices were louder for you then, and that they have prevailed now.
You could still use Pandit and work with Yigal and his team to get it exporting triples. That would be fantastic for everyone, a real benefit for the whole community working on Indian prosopography.
Your remark about TEI tells me that you haven't really looked into it. The TEI is not an off-the-shelf set of tags to rummage through and be disappointed because it doesn't define *gaccha*. It is a framework for *defining* XML schemas. IYes, it provides a lot of standard stuff already, but the real point of TEI is that it is extensible. You use it to define the special tags that you actually need for your project. For SARIT, for example, we define many things like "adhy=C4=81ya" "vy=C4=81khy=C4=81" and = so forth, as subdivisions of a generic "part," because that's useful in expressing how Sanskrit texts are structured.
I'm concerned, Peter, that by starting from zero with a highly complex project you will pour effort into it and find in five or ten years that it all has to be thrown away. That is a very common experience in projects of this kind. I apologize for speaking so plainly, but my concern is born of hard experience. Witness the Woolner project, and many others.
Best, Dominik
=E2=80=8B -- Professor Dominik Wujastyk http://ualberta.academia.edu/DominikWujastyk =E2=80=8B,=E2=80=8B
Singhmar Chair in Classical Indian Society and Polity =E2=80=8B,=E2=80=8B
Department of History and Classics http://historyandclassics.ualberta.ca/ =E2=80=8B,=E2=80=8B University of Alberta, Canada =E2=80=8B.=E2=80=8B
South Asia at the U of A:
=E2=80=8Bsas.ualberta.ca=E2=80=8B =E2=80=8B=E2=80=8B
On 17 May 2018 at 02:01, Peter Flugel
wrote: Dear Dominik,
We discussed the issues privately a year ago and we took many of your extremely useful suggestions on board.
As highlighted in the articles, the triplestore data of the JP will be made available free to use by anyone once the download functions and webpage are installed. I wish this was the case all around especially in India where Jaina electronic datasets are as difficult to access as the Jaisalmer bhandar.
The most interesting answer I received so far to the standard question as to the long term viability of databases (for the JP guaranteed by Sheffie= ld DHI) was the one of GabrieI Bodard at Senate House. I think he is right i= n saying that the only procedure that preserved texts in the past and will preserve electronic data over time is copying. No super database or web-portal will ever emerge and be able to survive over time (maybe apart from the internet itself). Building in options for linking purpose built datasets at the point of development therefore seems to be the way to go.
The data mining software Gabriel developed is limited to the standard categories of name, date, place etc. 'Stage two' prosopographies' however require uniform data and often highly differentiated taxononomies for specific purposes and require a great deal of case specific analysis.
Given permissions (!) it may be possible relatively easily to produce a dataset of datasets through copying. But the emerging datasets will not b= e uniform and hence require massive work to be useful for stage two prosopographies. I can presently not see how this can be handled without hands on editorship. Since eternal editor roles are inconceivable some ki= nd of standard will have to emerge.
Solving the conundrum of finding a standard for coding Indian family name= s for library catalogues proved tricky and has been abandoned as far as I c= an see. In the articles I laid out the practical and theoretical issues we faced as a memento of self-reflection but also in view of informing (potential) collaborators on this specific project. I found Keats-Rohan's book by far the most useful source for practical research with such tools= . She did not touch upon South Asian materials though.
On the addition problem how to operationalise patronage I didn't find muc= h useful information but I may have missed something. TEI has no useful categories. Off the shelf does not always work for particular research questions as well.
Clearly at least in South Asian Studies most work still lies ahead. We need datasets comparable to the projects on the Roman materials of Mommse= n etc. (which are still not digitally available, certainly not in state two prosopographical format). Pandit is fabulous but did not provide the tool= s for instigating the research questions informing the JP project. Otherwis= e I would have indeed simply requested Yigal to permit putting data in and computing them as suggested by you. I am not aware of a sociologically oriented stage-two prosopography in South Asian studies at present and cannot be sure whether in a limited way the JP will succeed realising thi= s aim. It also is a pilot study. But its data can be copied ad we learn qui= te a lot about the structure of the Jaina data in the process of analysis.
Sigh...
Peter
On 16 May 2018 at 18:54, Dominik Wujastyk
wrote: Dear Peter,
Can I suggest in the strongest terms that the Jaina prosopography projec= t uses the Pandit http://panditproject.org prosopography system for managing its data? Many of us are interested in the same kinds of questi= on, and it is critically important that we share data. If we all go off in corners and re-invent the wheel, we will set back our field by decades.
Pandit isn't perfect. It may not even meet all your needs, although I suspect you will be surprised by how much it does do for a project such = as yours. However, Yigal and the team who created and maintain Pandit are actively developing it and they are very open to technical suggestions (= and money!). I am in such a dialogue myself right now, with one of their system programmers who lives in Brazil.
It is easy to underestimate the difficulty of creating a sound tool for serious prosopographical work. Pandit is itself the third iteration of = the project. It started as a Windows-based Filemaker Pro project. That bec= ame unusable after a certain volume of data was added. Then it was ported t= o MySQL running on a Linux base. That was more robust, but also had drawbacks due to leftover features from the earlier Windows mess. Final= ly, Yigal had the whole thing rewritten again in the light of everything we = had learned and ported the very considerable volume of data forward to the n= ew system. Since then, even more author/work/manuscript data and bibliographical material has been added.
Looking at your Jaina prosopography paper, all the "methodological issues" you raise in sections 3 and the onomastic issues in section 4 we= re faced and mostly solved by the Philobiblon http://bancroft.berkeley.edu/philobiblon/ project, many years ago, in the context of Iberian prosopography. A major part of that thinking informs Pandit.
The basic fact is that systems such as this become exponentially more useful as more data is added. What you add links to what has been added before. New relationships are discovered, time is saved by not entering the same data repeatedly and this, in turn, leads to a major gain in accuracy. (A public genealogical system that demonstrates this kind of crowdsourced cooperation-gain superbly is geni.com.)
If you decide not to put your project's data in Pandit, then I hope you will produce a compelling document explaining why not. It will be very useful to the Pandit team for their own future consideration.
Best, Dominik
=E2=80=8B -- Professor Dominik Wujastyk <http://ualberta.academia.edu/DominikWujastyk=
=E2=80=8B,=E2=80=8B
Singhmar Chair in Classical Indian Society and Polity =E2=80=8B,=E2=80=8B
Department of History and Classics http://historyandclassics.ualberta.ca/ =E2=80=8B,=E2=80=8B University of Alberta, Canada =E2=80=8B.=E2=80=8B
South Asia at the U of A:
=E2=80=8Bsas.ualberta.ca=E2=80=8B =E2=80=8B=E2=80=8B
On 16 May 2018 at 03:19, Peter Flugel
wrote: Dear Friends,
I have just joined the list on recommendation. I haven't seen any of th= e earlier communications as yet, but as an opener would like to inform yo= u of a the above mentioned project and of the attached probing articles that emerged from in its context:
(2018) 'Jaina-Prosopography I: Sociology of Jaina-Names.' In: Balbir, Nalini and Fl=C3=BCgel, Peter, (eds.), *Jaina Studies. Select Papers Presented in the 'Jaina Studies' Section at the 16th World Sanskrit Conference.* Delhi: Rashtriya Sanskrit Sansthan & D.K. Publishers & Distributors, pp. 187-267. (Proceedings of the World Sanskrit Conferenc= e) https://eprints.soas.ac.uk/24708/
(2018) 'Jaina-Prosopography II: =E2=80=9CPatronage=E2=80=9D in Jaina Ep= igraphic and Manuscript Catalogues.' In: Chojnacki, Christine and Lecl=C3=A8re, Basi= le, (eds.), *Gift of Knowledge: Patterns of Patronage in Jainism.*Bangalore= : National Institute of Prakrit Research Shravanabelagola, pp. 1-46. (in press)
On the project: https://www.soas.ac.uk/jaina-prosopography/
Since the main purpose of the project is to create a tool to be used by the research community beyond the lifetime of the project any input lea= ding to an improvement of the approach would be more than welcome.
The database will be accessible by the beginning of 2020.
with best wishes
Peter
-- Dr Peter Fl=C3=BCgel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG
Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
-- Dr Peter Fl=C3=BCgel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG
Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies
--000000000000d2fb15056c674ea9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
</div>
Your remark about TEI tells me that you haven't real= ly looked into it. The TEI is not an off-the-shelf set of tags to rum= mage through and be disappointed because it doesn't define <i>gaccha</i>.&n= bsp; It is a framework for <i>defining</i> XML schemas. IYes, it prov= ides a lot of standard stuff already, but the real point of TEI is that it = is extensible. You use it to define the special tags that you actuall= y need for your project. For SARIT, for example, we define many thing= s like "adhy=C4=81ya" "vy=C4=81khy=C4=81" and so forth,= as subdivisions of a generic "part," because that's useful in ex=Professor http://ualberta.academia.edu/DominikWujastyk" st= yle=3D"color:rgb(17,85,204)" target=3D"_blank">Dominik Wujastyk</a>
=E2=80=8B,=E2=80=8B</div><br>Singhmar= Chair in Classical Indian Society and Polity=E2=80=8B,=E2=80=8B</div><br></font><div><= a href=3D"http://historyandclassics.ualberta.ca/" style=3D"color:rgb(17,85,= 204)" target=3D"_blank">Department of History and Classics</a>=E2=80=8B,=E2=80=8B</div></font></div>University of Alberta, Canada
Dear Peter, yes, I remember our conversation= . I regret that other voices were louder for you then, and that they = have prevailed now.<br></div><br></div>You = could still use Pandit and work with Yigal and his team to get it exporting= triples. That would be fantastic for everyone, a real benefit for th= e whole community working on Indian prosopography.</div>
<br></div>=I'm concerned, Peter, that by starting from zero with a hig= hly complex project you will pour effort into it and find in five or ten ye= ars that it all has to be thrown away. That is a very common experien= ce in projects of this kind. I apologize for speaking so = plainly, but my concern is born of hard experience. Witness the Wooln= er project, and many others.<br></div><br></div>Best,</div>Dominik</div><br></div><br></div></div>
<= div><div><span><= div dir=3D"ltr"><div>=E2=80=8B</div>--= </span><br></font></div>=E2=80=8B.=E2=80=8B</div><br></font></div><div>South Asia at the U of A:</font></div>http://sas.ualberta.ca/" style=3D"color:rg= b(17,85,204)" target=3D"_blank">=E2=80=8Bsas.ualberta.ca=E2=80=8B</a>=E2=80=8B=E2=80=8B</div></font></div></div></div>= div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></span><div><span><br></font></div></div></div></div> </div></div> </div></div> </div></div></div></div> </font></div></div></span></div></div></span></div></div></div></div></div>= </div></div> <br>On 17 May 2018 at 02:01, Peter Flugel <pf8@soa= s.ac.uk</a>></span> wrote:<br>Dear Dominik,</span>
<br></div>We discussed the i= ssues privately a year ago and we took many of your extremely useful sugges= tions on board.</div><br></div>As highlighted in the articles, the triplest= ore data of the JP will be made available free to use by anyone once the do= wnload functions and webpage are installed. I wish this was the case all ar= ound especially in India where Jaina electronic datasets are as difficult t= o access as the Jaisalmer bhandar.</div><br></div>The most interesting answ= er I received so far to the standard question as to the long term viability= of databases (for the JP guaranteed by Sheffield DHI) was the one of Gabri= eI Bodard at Senate House. I think he is right in saying that the only proc= edure that preserved texts in the past and will preserve electronic data ov= er time is copying. No super database or web-portal will ever emerge and be= able to survive over time (maybe apart from the internet itself). Building= in options for linking purpose built datasets at the point of development = therefore seems to be the way to go. </div><br></div>The data mining s= oftware Gabriel developed is limited to the standard categories of name, da= te, place etc. 'Stage two' prosopographies' however require uniform data an= d often highly differentiated taxononomies for specific purposes and requir= e a great deal of case specific analysis. </div><br></div>Given perm= issions (!) it may be possible relatively easily to produce a dataset of da= tasets through copying. But the emerging datasets will not be uniform and h= ence require massive work to be useful for stage two prosopographies. I can= presently not see how this can be handled without hands on editorship. Sin= ce eternal editor roles are inconceivable some kind of standard will have t= o emerge. </div><br></div>Solving the conundrum of finding a sta= ndard for coding Indian family names for library catalogues proved tricky a= nd has been abandoned as far as I can see. In the articles I laid out the p= ractical and theoretical issues we faced as a memento of self-reflection bu= t also in view of informing (potential) collaborators on this specific proj= ect. I found Keats-Rohan's book by far the most useful source for practical= research with such tools. She did not touch upon South Asian materials tho= ugh. </div><br></div>On the addition problem how to operationalise pat= ronage I didn't find much useful information but I may have missed somethin= g. TEI has no useful categories. Off the shelf does not always work for par= ticular research questions as well.</div><br></div>Clearly at least in Sout= h Asian Studies most work still lies ahead. We need datasets comparable to = the projects on the Roman materials of Mommsen etc. (which are still n= ot digitally available, certainly not in state two prosopographical format)= . Pandit is fabulous but did not provide the tools for instigating the rese= arch questions informing the JP project. Otherwise I would have indeed simp= ly requested Yigal to permit putting data in and computing them as suggeste= d by you. I am not aware of a sociologically oriented stage-two prosopograp= hy in South Asian studies at present and cannot be sure whether in a limite= d way the JP will succeed realising this aim. It also is a pilot study. But= its data can be copied ad we learn quite a lot about the structure of the = Jaina data in the process of analysis.</div><br></div>Sigh...</div><br>Peter</div></div>
Looking at your Jaina prosopography paper, all the &= quot;methodological issues" you raise in sections 3 and the onomastic = issues in section 4 were faced and mostly solved by the http://b= ancroft.berkeley.edu/philobiblon/" target=3D"_blank">Philobiblon</a> projec= t, many years ago, in the context of Iberian prosopography. A major p= art of that thinking informs Pandit. <br></div>Professor http://ualberta.academia.edu/Do= minikWujastyk" style=3D"color:rgb(17,85,204)" target=3D"_blank">Dominik Wuj= astyk</a>
=E2=80=8B,=E2=80=8B<= /div><br>Singhmar Chair in Classical Indian Society and Polity=E2=80=8B,=E2=80=8B</div><br></font><div><= font size=3D"1">http://historyandclassics.ualberta.ca/" style=3D= "color:rgb(17,85,204)" target=3D"_blank">Department of History and Classics= </a>=E2=80=8B,=E2=80=8B</div>= </font></div>University of Alberta, Canada=E2=80=8B.=E2=80=8B</div><br></font></div>South Asia at the U of A:</font> </div>http://sas.ualberta.ca/" =<br></div></span>
<br>= On 16 May 2018 at 18:54, Dominik Wujastyk <wujastyk@gmai= l.com</a>></span> wrote:<br></span><div>Dear Peter,</div><br></div>Can I suggest in the strongest terms that= the Jaina prosopography project uses the http://panditproject.o= rg" target=3D"_blank">Pandit</a> prosopography system for managing its data= ? Many of us are interested in the same kinds of question, and it is critic= ally important that we share data. If we all go off in corners and re= -invent the wheel, we will set back our field by decades.</div><br></div>Pandit isn't perfect. It may not even= meet all your needs, although I suspect you will be surprised by how much = it does do for a project such as yours. However, Yigal and the team w= ho created and maintain Pandit are actively developing it and they are very= open to technical suggestions (and money!). I am in such a dialogue = myself right now, with one of their system programmers who lives in Brazil.= <br></div><br></div>It is easy to underest= imate the difficulty of creating a sound tool for serious prosopographical = work. Pandit is itself the third iteration of the project. It s= tarted as a Windows-based Filemaker Pro project. That became unusable= after a certain volume of data was added. Then it was ported to MySQ= L running on a Linux base. That was more robust, but also had drawbac= ks due to leftover features from the earlier Windows mess. Finally, Y= igal had the whole thing rewritten again in the light of everything we had = learned and ported the very considerable volume of data forward to the new = system. Since then, even more author/work/manuscript data and b= ibliographical material has been added. <br></div>
<br></div>The basic fact is that systems such as this become exponentia= lly more useful as more data is added. What you add links to what has= been added before. New relationships are discovered, time is saved b= y not entering the same data repeatedly and this, in turn, leads to a major= gain in accuracy. (A public genealogical system that demonstrates th= is kind of crowdsourced cooperation-gain superbly is http://geni= .com" target=3D"_blank">geni.com</a>.)<br></div> <br></div><= div class=3D"gmail_default" style=3D"font-family:trebuchet ms,sans-serif;fo= nt-size:small">If you decide not to put your project's data in Pandit, then= I hope you will produce a compelling document explaining why not. It= will be very useful to the Pandit team for their own future consideration.= </div><br></div>Best,</div>= Dominik<br></div><br></div><br></div></div>=
<div><div><span>=<div>= =E2=80=8B</div>--</span><br></font></div>=E2=80=8Bsas.ualberta.ca= =E2=80=8B</a>=E2=80=8B=E2=80=8B</div></font><= /div></div></div></div></div></div></div></div></div></div></div></div> </div></div></div></div></div></div></div></span><div><span><= div style=3D"font-family:trebuchet ms,sans-serif;font-size:small"><br></font></div></div></div>= div> </div></div> </div></div> </div></div></div></div> </font></div></div></span></div></div></span></div></div></div></div></div>= </div></div> <br>On 16 May 2018 at 03:19, Peter Flugel <pf8@soa= s.ac.uk</a>></span> wrote:<br>Dear Friends,<div><br></div><div>I have just joined the list on recom= mendation. I haven't seen any of the earlier communications as yet, but as = an opener would like to inform you of a the above mentioned project and of = the attached probing articles that emerged from in its context:</div><div><= br></div><div>(2018) 'Jaina-Prosopog= raphy I: Sociology of Jaina-Names.' In:<span> </span></span>Balbir, Nalini<span> </span>= and<span> </span></span>Fl=C3=BCgel, Peter</span>, (eds.),<span> </span></span>Jaina Studies. Select Papers Presented in the 'Jaina Studies'= Section at the 16th World Sanskrit Conference.</em><span> </span>Delhi: Rashtriya Sanskrit = Sansthan & D.K. Publishers & Distributors, pp. 187-267. (Proceeding= s of the World Sanskrit Conference)</span> https://eprints.= soas.ac.uk/24708/" target=3D"_blank">https://eprints.so<wbr>as.ac.uk/24708/= </a></div><div><br></div><div>
(2018) 'Jaina-Prosopog= raphy II: =E2=80=9CPatronage=E2=80=9D in Jaina Epigraphic and Manuscript Ca= talogues.' In:<span> </span></span>Chojnacki, Christine</span><span> </span>and<span> </span>Lecl=C3=A8re, Basile</span>, (eds.),<span> </span></span>Gif= t of Knowledge: Patterns of Patronage in Jainism.</em>Bangalore: National Institute of Prakrit Re= search Shravanabelagola, pp. 1-46. (in press)</span>
<div><br></div>-- <br>
<div><=<br></div><div><br></div><div>On the project: https://www.soas.a= c.uk/jaina-prosopography/" target=3D"_blank">https://www.soas.ac.uk/jaina-p= <wbr>rosopography/</a></div><div><br></div><div>Since the main purpose of t= he project is to create a tool to be used by the research community beyond = the lifetime of the project any input leading to an improvement of the appr= oach would be more than welcome.</div><div><br></div><div>The database will= be accessible by the beginning of 2020.</div><div><br></div><div>with best= wishes</div><div><br></div><div>Peter
<div><br></div>-- <= br><div><div>Dr Peter= Fl=C3=BCgel<br>Chair, Centre of Jaina Studies<br>Department of History, Re= ligions and Philosophies<br>School of Oriental and African Studies<br>Unive= rsity of London<br>Thornhaugh Street<br>Russell Square<br>London WC1H OXG<br>Tel.: (+44-20) 7898 4776<br>E-mail: pf8@soas.ac.uk</a><br>http://www.soas.ac.= uk/jainastudies" target=3D"_blank">http://www.soas.ac.uk/jainastu<wbr>dies<= /a></div></div></div></div></div> </div></div> <br>______________________________<wbr>_________________<br> indic-texts mailing list<br> indic-text= s@lists.tei-c.org</a><br> http://lists.lists.tei-c.org/mailman/listinfo/indic-texts" rel= =3D"noreferrer" target=3D"_blank">http://lists.lists.tei-c.org/m<wbr>ailman= /listinfo/indic-texts</a><br> <br></blockquote></div><br></div> </blockquote></div></div></div><div><br>
Dr Peter Fl=C3=BCgel<br>Chair, Centre of Jaina Studies<br>Department of= History, Religions and Philosophies<br>School of Oriental and African Stud= ies<br>University of London<br>Thornhaugh Street<br>Russell Square<br>Londo= n WC1H OXG<br><br>Tel.: (+44-20) 7898 4776<br>E-mail: pf8@soas.ac.uk</a><br>http://www.soas.ac.uk/<wbr>j= ainastudies</a></div></div></div></div></div> </div></div></div> </blockquote></div><br></div>--000000000000d2fb15056c674ea9--
--===============0923470341771615001== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
--===============0923470341771615001==--
28 May 28 May6:24 p.m.Dear list members, I have been following both the Jaina-Prosopography and PANDiT projects with great interest and optimism. There are two general questions that have arisen which I think need to be separated: the possibility of sharing data between projects, and the use of TEI as a data format. There is luckily no disagreement over the fact that data should be published in a free and accessible way. But the really essential thing is not just to publish the data, but to publish it in a format that can be queried and retrieved programmatically. This is precisely what "Linked Open Data" is supposed to do, and there has been a huge amount of work in neighboring fields like Classics to build resources that are linked and open in precisely this way. For example, this http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/ collection of papers (now a bit dated) about "current practice in linked open data for the ancient world" and the SNAP:DRGN http://snapdrgn.net project (also a bit dated). I think Gabriel Bodard is consulting on the Jaina-Prosopography project and from Peter's description the data will be published in accordance with LOD standards. Making the data available in other formats, such as CSV or TEI, is a nice gesture, and may be useful for certain users, but because CSV and TEI documents are just documents, and there aren't tools for extracting relations from huge amounts of CSV or TEI data (well, there probably are for CSV...), they are about as useful as plain text files. One of the great benefits of the LOD approach is that projects can share data despite having different data models. In order for one project to use another's data, there will inevitably be some work of mapping the ontology of one onto the ontology of another (something that PANDiT has dealt with over successive imports of data from other sources). But we are not in the situation we were in previously, where the data of one project is essentially useless to another without a massive investment of time and money. Now to come back to TEI: there are projects that use TEI as the basic data format for prosopographic data, such as Syriaca.org. But TEI is meant to encode text data, and it is not particularly good at representing relations between entities in the kind of well-defined ontologies that prosopographic databases need. Syriaca has essentially had to define their ontology, and controlled vocabularies, and then find ways of representing those vocabularies in TEI. There's a lot that can go wrong there. Neither of the databases we're talking about uses TEI as its basic data format, for good reasons. We might want them to publish their data as TEI, as an exchange format, but as I noted above, it's not clear what any of us would do with (in the case of PANDiT) 50,848 TEI documents. prayojanam anuddiśya na mando ’pi pravartate. What is it exactly that we want from the published data? What do we want to do with it? How do we want to share it, query it, connect it? We now have these amazing resources, and we should try to use them often, and use them creatively. I think that LOD standards would help in a lot of respects (e.g., being able to get relevant biodata for a given person just from the PANDiT ID and put it on a website programmatically) but I am very curious about what specific purposes would be served by publishing the data in TEI format. Andrew
29 May 29 May5:30 a.m.I would like to add a word about where TEI is useful in creating a prosopographic database. TEI is for marking up text, but the TEI-Ms. guidelines include specific guidelines for transcribing manuscripts and for including catalogue information in the teiHeader. These TEI elements include elements for author, scribes, other people, places, dates, that is, just the information that would be useful in a prosopographic database. Manusrcripts catalogued using these TEI elements are easily mined for this information. The Sanskrit Library has now catalogued about 2,000 manuscripts using a template we developed in accordance with the TEI-Ms guidelines. We hope someday to mine this information, add it to Pandit, and link the catalogue entries with Pandit. I think this is the kind of thing that TEI is good for. Once such information is extracted from text and put in a database, there is no need for TEI; it is quite clumsy to mark up a database in TEI with table, row and cell elements. Similarly using TEI to mark up texts with person and place elements can also contribute to enriching a prosopgraphic database. Yours, Peter ****************************** Peter M. Scharf, President The Sanskrit Library scharf@sanskritlibrary.org http://sanskritlibrary.org ******************************
On 28 May. 2018, at 11:24 AM, Andrew Ollett
wrote: Dear list members,
I have been following both the Jaina-Prosopography and PANDiT projects with great interest and optimism. There are two general questions that have arisen which I think need to be separated: the possibility of sharing data between projects, and the use of TEI as a data format. There is luckily no disagreement over the fact that data should be published in a free and accessible way. But the really essential thing is not just to publish the data, but to publish it in a format that can be queried and retrieved programmatically. This is precisely what "Linked Open Data" is supposed to do, and there has been a huge amount of work in neighboring fields like Classics to build resources that are linked and open in precisely this way. For example, this http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/ collection of papers (now a bit dated) about "current practice in linked open data for the ancient world" and the SNAP:DRGN http://snapdrgn.net/ project (also a bit dated). I think Gabriel Bodard is consulting on the Jaina-Prosopography project and from Peter's description the data will be published in accordance with LOD standards. Making the data available in other formats, such as CSV or TEI, is a nice gesture, and may be useful for certain users, but because CSV and TEI documents are just documents, and there aren't tools for extracting relations from huge amounts of CSV or TEI data (well, there probably are for CSV...), they are about as useful as plain text files.
One of the great benefits of the LOD approach is that projects can share data despite having different data models. In order for one project to use another's data, there will inevitably be some work of mapping the ontology of one onto the ontology of another (something that PANDiT has dealt with over successive imports of data from other sources). But we are not in the situation we were in previously, where the data of one project is essentially useless to another without a massive investment of time and money.
Now to come back to TEI: there are projects that use TEI as the basic data format for prosopographic data, such as Syriaca.org http://syriaca.org/. But TEI is meant to encode text data, and it is not particularly good at representing relations between entities in the kind of well-defined ontologies that prosopographic databases need. Syriaca has essentially had to define their ontology, and controlled vocabularies, and then find ways of representing those vocabularies in TEI. There's a lot that can go wrong there. Neither of the databases we're talking about uses TEI as its basic data format, for good reasons. We might want them to publish their data as TEI, as an exchange format, but as I noted above, it's not clear what any of us would do with (in the case of PANDiT) 50,848 TEI documents.
prayojanam anuddiśya na mando ’pi pravartate. What is it exactly that we want from the published data? What do we want to do with it? How do we want to share it, query it, connect it? We now have these amazing resources, and we should try to use them often, and use them creatively. I think that LOD standards would help in a lot of respects (e.g., being able to get relevant biodata for a given person just from the PANDiT ID and put it on a website programmatically) but I am very curious about what specific purposes would be served by publishing the data in TEI format.
Andrew _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
30 May 30 May11:24 p.m.TEI does include a mechanism for pointing to external data entities in complex ways (Guidelines chapter 16); it can be done, although this is clumsy and not the answer one would wish for in a large project. Also, TEI has an extraordinary depth of documentary awareness that nobody with serious scholarly engagement would want to relinquish. Just one example, Guidelines chapter 21. The ability to express degrees of certainty is central to the scholarly endeavour. The basic idea of the data triple doesn't - as far as I can see - provide anything like the granularity that one would look for in a set of relations. Everything is linked by "is" as if that were an unproblematic or universal form predication. (I hasten to say that I don't understand data triples and semantic ontologies very well, and I could well be wrong about what can be done.) The tension between TEI and semantically linked data - or in my old-fashioned language, between documents and databases - is very much a current discussion in the TEI world. See, e.g., https://journals.openedition.org/jtei/1191?lang=en, https://journals.openedition.org/jtei/1480#tocto1n2, https://hcmc.uvic.ca/tei2017/abstracts/t_141_ore_ontologiesconceptualmodels...., http://www.1890s.ca/PDFs/Crossing%20the%20Stile.pdf, and much more. My view may be summarized as "jam today." The Pandit project already exists and is already rather wonderful. It has achieved a critical mass of data that makes it already a discovery tool. Today. It has received a lot of expert curation and funding over many years. It is not to be ignored or discarded. Data entered into Pandit will not be lost in future. Yes, Pandit needs to grow in important new directions. Among the most important is that it should develop transparent import-export mechanisms. And - yes - it needs to be able to write out data in a form that maintains the semantic ontology that it embodies, and that can be used by others. But it seems inexplicable to me to ignore Pandit and start a prosopographical project in competition with it. Best, Dominik On Mon, 28 May 2018 at 10:24, Andrew Ollett
wrote: Dear list members,
I have been following both the Jaina-Prosopography and PANDiT projects with great interest and optimism. There are two general questions that have arisen which I think need to be separated: the possibility of sharing data between projects, and the use of TEI as a data format. There is luckily no disagreement over the fact that data should be published in a free and accessible way. But the really essential thing is not just to publish the data, but to publish it in a format that can be queried and retrieved programmatically. This is precisely what "Linked Open Data" is supposed to do, and there has been a huge amount of work in neighboring fields like Classics to build resources that are linked and open in precisely this way. For example, this http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/ collection of papers (now a bit dated) about "current practice in linked open data for the ancient world" and the SNAP:DRGN http://snapdrgn.net project (also a bit dated). I think Gabriel Bodard is consulting on the Jaina-Prosopography project and from Peter's description the data will be published in accordance with LOD standards. Making the data available in other formats, such as CSV or TEI, is a nice gesture, and may be useful for certain users, but because CSV and TEI documents are just documents, and there aren't tools for extracting relations from huge amounts of CSV or TEI data (well, there probably are for CSV...), they are about as useful as plain text files.
One of the great benefits of the LOD approach is that projects can share data despite having different data models. In order for one project to use another's data, there will inevitably be some work of mapping the ontology of one onto the ontology of another (something that PANDiT has dealt with over successive imports of data from other sources). But we are not in the situation we were in previously, where the data of one project is essentially useless to another without a massive investment of time and money.
Now to come back to TEI: there are projects that use TEI as the basic data format for prosopographic data, such as Syriaca.org. But TEI is meant to encode text data, and it is not particularly good at representing relations between entities in the kind of well-defined ontologies that prosopographic databases need. Syriaca has essentially had to define their ontology, and controlled vocabularies, and then find ways of representing those vocabularies in TEI. There's a lot that can go wrong there. Neither of the databases we're talking about uses TEI as its basic data format, for good reasons. We might want them to publish their data as TEI, as an exchange format, but as I noted above, it's not clear what any of us would do with (in the case of PANDiT) 50,848 TEI documents.
prayojanam anuddiśya na mando ’pi pravartate. What is it exactly that we want from the published data? What do we want to do with it? How do we want to share it, query it, connect it? We now have these amazing resources, and we should try to use them often, and use them creatively. I think that LOD standards would help in a lot of respects (e.g., being able to get relevant biodata for a given person just from the PANDiT ID and put it on a website programmatically) but I am very curious about what specific purposes would be served by publishing the data in TEI format.
Andrew _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
31 May 31 May8:56 a.m.Dear Dominik, I hope we can move on. If you have read a bit of the two articles which I passed round you would be aware that the JP project has grown organicalically out of the Jaina Onomasticon project and is largely a continuation of it with other means. It definitely has not been set up as competition to any other project. Consultation on the best DH approach was extensive and started in earnest four years ago. Both the application.and again the articles address the issue why then existing databases and TEI approaches were not able to help the project whatever happens in future. Both Karin Preisendanz (Lahore project) and Yigal Bronner are advisers to the project and well aware of all the issues. Etc. There will always and should be multiple projects. Mere duplication of course would be a waste of resources and is not funded. I understand Andrew that this forum ts intended to be potentially useful for discussing the problems of finetuning and linking a variety of different projects in DH pertaining to South Asian Studies. I certainly would like to know more about how the numerous TEI projects do or plan to link data in future if at all. Data mining may be a promising route. Klatt excerpted the contents of his sources and these were the basis of his Onomasticon (we had to reconstruct the bibliography). This time consuming process is at the heart of any prosopographical project and here duplication should indeed be avoided. The task is so enormous that I don't think competition will an issue for decades to come. I certainly don't think in this way at all and have explored for several years possibilities of exploring ways of linking Jaina datasets in India to the Klatt data for mutual benefit (fingers crossed) and I am excited to learn about Peter Scharf's work. with best wishes Peter On 30 May 2018 at 22:24, Dominik Wujastyk
wrote: TEI does include a mechanism for pointing to external data entities in complex ways (Guidelines chapter 16); it can be done, although this is clumsy and not the answer one would wish for in a large project.
Also, TEI has an extraordinary depth of documentary awareness that nobody with serious scholarly engagement would want to relinquish. Just one example, Guidelines chapter 21. The ability to express degrees of certainty is central to the scholarly endeavour.
The basic idea of the data triple doesn't - as far as I can see - provide anything like the granularity that one would look for in a set of relations. Everything is linked by "is" as if that were an unproblematic or universal form predication. (I hasten to say that I don't understand data triples and semantic ontologies very well, and I could well be wrong about what can be done.)
The tension between TEI and semantically linked data - or in my old-fashioned language, between documents and databases - is very much a current discussion in the TEI world. See, e.g., https://journals.openedition.org/jtei/1191?lang=en, https://journals.openedition.org/jtei/1480#tocto1n2, https://hcmc.uvic.ca/tei2017/abstracts/t_141_ore_ ontologiesconceptualmodels.html, http://www.1890s.ca/PDFs/ Crossing%20the%20Stile.pdf, and much more.
My view may be summarized as "jam today." The Pandit project already exists and is already rather wonderful. It has achieved a critical mass of data that makes it already a discovery tool. Today. It has received a lot of expert curation and funding over many years. It is not to be ignored or discarded. Data entered into Pandit will not be lost in future. Yes, Pandit needs to grow in important new directions. Among the most important is that it should develop transparent import-export mechanisms. And - yes - it needs to be able to write out data in a form that maintains the semantic ontology that it embodies, and that can be used by others. But it seems inexplicable to me to ignore Pandit and start a prosopographical project in competition with it.
Best, Dominik
On Mon, 28 May 2018 at 10:24, Andrew Ollett
wrote: Dear list members,
I have been following both the Jaina-Prosopography and PANDiT projects with great interest and optimism. There are two general questions that have arisen which I think need to be separated: the possibility of sharing data between projects, and the use of TEI as a data format. There is luckily no disagreement over the fact that data should be published in a free and accessible way. But the really essential thing is not just to publish the data, but to publish it in a format that can be queried and retrieved programmatically. This is precisely what "Linked Open Data" is supposed to do, and there has been a huge amount of work in neighboring fields like Classics to build resources that are linked and open in precisely this way. For example, this http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/ collection of papers (now a bit dated) about "current practice in linked open data for the ancient world" and the SNAP:DRGN http://snapdrgn.net project (also a bit dated). I think Gabriel Bodard is consulting on the Jaina-Prosopography project and from Peter's description the data will be published in accordance with LOD standards. Making the data available in other formats, such as CSV or TEI, is a nice gesture, and may be useful for certain users, but because CSV and TEI documents are just documents, and there aren't tools for extracting relations from huge amounts of CSV or TEI data (well, there probably are for CSV...), they are about as useful as plain text files.
One of the great benefits of the LOD approach is that projects can share data despite having different data models. In order for one project to use another's data, there will inevitably be some work of mapping the ontology of one onto the ontology of another (something that PANDiT has dealt with over successive imports of data from other sources). But we are not in the situation we were in previously, where the data of one project is essentially useless to another without a massive investment of time and money.
Now to come back to TEI: there are projects that use TEI as the basic data format for prosopographic data, such as Syriaca.org. But TEI is meant to encode text data, and it is not particularly good at representing relations between entities in the kind of well-defined ontologies that prosopographic databases need. Syriaca has essentially had to define their ontology, and controlled vocabularies, and then find ways of representing those vocabularies in TEI. There's a lot that can go wrong there. Neither of the databases we're talking about uses TEI as its basic data format, for good reasons. We might want them to publish their data as TEI, as an exchange format, but as I noted above, it's not clear what any of us would do with (in the case of PANDiT) 50,848 TEI documents.
prayojanam anuddiśya na mando ’pi pravartate. What is it exactly that we want from the published data? What do we want to do with it? How do we want to share it, query it, connect it? We now have these amazing resources, and we should try to use them often, and use them creatively. I think that LOD standards would help in a lot of respects (e.g., being able to get relevant biodata for a given person just from the PANDiT ID and put it on a website programmatically) but I am very curious about what specific purposes would be served by publishing the data in TEI format.
Andrew _______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
_______________________________________________ indic-texts mailing list indic-texts@lists.tei-c.org http://lists.lists.tei-c.org/mailman/listinfo/indic-texts
-- Dr Peter Flügel Chair, Centre of Jaina Studies Department of History, Religions and Philosophies School of Oriental and African Studies University of London Thornhaugh Street Russell Square London WC1H OXG Tel.: (+44-20) 7898 4776 E-mail: pf8@soas.ac.uk http://www.soas.ac.uk/jainastudies
Download2397Age (days ago)2412Last active (days ago)
8 comments5 participantsparticipants (5)
Andrew Ollett Dominik Wujastyk Peter Flugel Peter Scharf Yigal Bronner