Religion and Science: “digital Buddhology”

Computational Linguistics in Buddhism

We would like to apply computational linguistics to Buddhist texts. Specifically, the first phase is to run Natural Language Processing and especially Minimum Description Length (MDL) algorithms to generate automatic dictionaries of existing bodies of texts.

Second, we would use computers to discover the grammar of different texts automatically. With the help of these methods we can take the ‘fingerprint’ of each text and ask interesting questions. For example we can prove or disprove theories about the authenticity of texts, who copied from whom, who was the author of which texts (attribution problems) and also create a chronology of texts that is based on large amounts of data, much larger dataset than would humanly be possible for any single human being. Using such computational methods would allow us to be digital historians. Budget for 3 years for which we need funding: £90,000.

Integration of Buddhist Textual Databases

Buddhist texts from various canonical languages such as Pali, Sanskrit, Chinese, Tibetan and numerous others are now available digitally. Yet the quality of the data is very poor due to poor input and lack of expert proofreading. There is also a lack of interoperability across data sources. For example there is no single place on the internet or in the world where all digital Buddhist texts in existence can be viewed, searched, indexed, cross-referenced and so on.

We have separate databases in Tibetan, with Chinese, as well as in Pali but no way to communicate across them. Such an integration problem that prevents both humanities scholars from running effective searches as well as digital humanities scholars from building computational models. Libraries like the British Library or the Oxford Bodleian are unable to fund such ‘integration projects’ because their funding sources usually like to see “new” initiatives. However, without a robust, searchable, high quality textual source it is impossible to move forward Buddhology and especially digital Buddhology.

We would like to provide an open-source platform to host multilingual datasources to solve this integration problem. Layering this database with a crowd-editing platform would allow low-quality digital text as well as scanned manuscripts to be turned into high quality, machine readable texts to empower both scholars and practitioners.

As many Buddhist groups, as well as Libraries around the world struggle with this integration problem, we are happy to be part of another organization’s effort or to lead this project ourselves. We estimate that the annual budget needed is £250,000. An gift of £3 million would endow the project in perpetuity (forever) and would make sure that the project does not die off or become outdated like many of the current digital Buddhist resources.