Invisible East Digital Corpus

Frequently Asked Questions

In this section, we have benefitted from the extensive experience of our colleagues working with documents from Egypt and the wider Arabic-speaking world, and have borrowed some of their guidance which is also applicable to the IEDC.

The Invisible East Corpus

What is the Invisible East corpus?

As of May 2025 the IEDC consists of 1182 documents including 367 transcribed and 178 transcribed and translated into English.

What kinds of texts are preserved here digitally?

An estimated ninety percent of the IEDC fragments come from documents: letters, legal deeds, lists, accounts, state documents and other everyday writings and ephemera.
The remaining ten percent are coming from long-form literary texts, including liturgy, Hebrew Bible, rabbinic literature, medicine, astronomy, lexicography, poetry and theology.
During this period, paper, parchment, and other writing materials were frequently reused. This practice often resulted in two entirely unrelated texts being written on the front (recto) and back (verso) of the same piece of paper, parchment, or other writing support. In our corpus, if the recto and verso are connected or if the verso is left blank, the document is assigned a single shelfmark, e.g. 'Ms.Heb.8333.99.' Conversely, when the recto and verso are unrelated, each side is given its own distinct shelfmark, e.g. 'Ms.Heb.8333.67recto' and 'Ms.Heb.8333.67verso'.

Where were the IEDC texts written?

All the texts survived in Iran, Afghanistan and Central Asia.

When is the material from?

The majority of IEDC documents date to the period between 737 and 1221 CE.

Where are the IEDC texts now?

The IEDC texts can now be found in around 12 libraries and private collections in Europe, Afghanistan, Central Asia, Russia and China.

For more, see Research Tools.

The Invisible East Digital Corpus

What is the IEDC?

IEDC is a database devoted to the text fragments from the medieval Islamicate East that are already in the public domain but scattered and not easily accessible.
The IEDC also includes seals and bullae preserved on documents.

How did the project begin?

The IEDC was founded in 2019 by Arezou Azad to digitize transcriptions and translations of documents from the Islamicate East. The first transcriptions uploaded are those of IE Board members, Nicholas Sims-Williams and Geoffrey Khan.
Through the Invisible East programme, the IEDC has come to include transcriptions and translations by many other researchers, as well as descriptions and research aids.

What is the Invisible East programme?

The IE programme is currently funded by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (Go.Local project, grant agreement number 851607) and by the Arts and Humanities Research Council grant for the Persdoc project.
For more, see Who we are in the IE's programme site.

For whom is the Invisible East programme intended?

The IE programme serves as a resource for professional document researchers, but it has been set up to be accessible to non-specialists, including students and the public.
Anyone interested in the social and economic history of the medieval and early modern Middle East, Central, and its multiconfessional communities can benefit from the IEDC.
Other disciplines can make use of our perspectives, resources, and approaches, e.g., other digital humanities projects, linguists, open source advocates, librarians, archivists and software engineers.

What are the aims of the IEDC?

to provide access to documents from the Islamicate East (Iran, Afghanistan and Central Asia) and the interim products of scholarship on the texts, including unpublished notes and research tools by and for scholars in the field;
to facilitate access to documents in order to fuel research into premodern global history;
to capture all the documentary texts from the Bamiyan Papers (aka “Afghan Genizah”), and related caches from Afghanistan, such as the Firuzkuh Papers and the Bactrian documents, as well as medieval documents from Iran, such as the Tabaristan archives.

How are IEDC entries structured?

IEDC records include five kinds of information:

Classifications. Each document is titled with a shelfmark (the call-number in the collection where it's housed) and classed into one of six types , as an adaptation from the long-established Princeton Geniza Project: legal document, letter, list or table, paraliterary text, or state/administrative document.
Descriptive information. Two-thirds of our entries have detailed descriptions of the document's contents. Many also have #tags, but tags aren't comprehensive; they merely represent the interests of the researchers who have done the tagging.
Images. We currently display images from three collections: the National Library of Israel, the Afghanistan National Archives, and the Khalili Collections (in Bactrian, Arabic and New Persian). The images are displayed in conformity with the International Image Interoperability Framework (IIIF). As more geniza-holding institutions adopt the IIIF, we will add their images to our site.
Transcriptions. Because it can be challenging to read the handwriting of medieval scribes, scholars produce typewritten copies that can be read easily and searched digitally. Transcriptions are also referred to as scholarly editions. The IEDC currently has 88 transcriptions, with more to come.
Scholarship records. Our records list who has transcribed the document (as well as whether the transcription has been published, and if so, where). They also list the published books and articles or unpublished notes from which we have derived the information in our descriptions.

Can you briefly describe your data model?

First: what is a data model? A data model is a way to organize different types of data (some examples in our case: documents, fragments, images and descriptions), and to standardize how they relate to each other.
At the core of our data model is a many-to-many relationship between physical fragments and the textual units that we call documents. A single fragment can contain multiple documents, as when a scribe used the blank back of a page to write another text. Conversely, a single document can be written across multiple fragments, as when a text was torn and the pieces now have different classmarks, and/or are in different libraries.

Which philological transcription conventions do you follow?

The IEDC has followed varying sets of transcription conventions, sometimes reflecting the choices of the text-editors whose editions we have digitized.
The IEDC transcription conventions are IJMES Arabic for Arabic and Persian, plus IJMES Persian for Persian-only letters (گ, پ, ژ).

How do I get involved with IEDC?

We have a team of dedicated and talented researchers who have come to us from many directions. They include undergraduates, graduate students, postdocs and faculty at Oxford and other institutions, as well as teachers, librarians and other professionals interested in Islamic and Iranian studies.
If you would like to contribute information to the IEDC, we'll soon be adding links to document records through which you can add suggestions. In the meantime, please contact us at: invisible_east@conted.ox.ac.uk.
If you would like to give feedback, would like to receive our newsletter or would like to contact Invisible East, please write to us at invisible_east@conted.ox.ac.uk.

Skills

What languages do I need to know to read IEDC texts?

We are including many English translations in the IEDC.
The IEDC contains texts written in 13 languages and nine scripts. The languages are Arabic, Persian, Judeo-Persian, Middle Persian, Bactrian, Sogdian, Khotanese, Chinese, Tibetan, Sanskrit, Hebrew, Old Uyghur, Turkish.
For more on the languages that are useful for studying IEDC fragments, see Research Tools.

Wait. This Persian doesn't look like the Persian I learned how to read. What's up with that?

Modern, printed Persian is derived from book script, a formal register of Persian that book scribes wrote when they were getting paid to copy a text or trying to impress their readers. Scribes writing for everyday purposes tended to use more informal handwriting, with varying degrees of cursiveness. Moreover, the Persian in the IEDC corpus is some of the oldest surviving Persian in the original in the world, where writing conventions were different to what appeared several centuries later. For a timeline of Persian writing, see this chart.
Learning to read premodern handwritten Persian is a skill that can be mastered with time and motivation. You might want to start with this palaeography guide created by Pejman Firoozbakhsh for Invisible East researchers and anyone interested.

Why are there so few dots in the Persian and Arabic documents, and how on earth do you expect me to make sense of these scribbles?

Documentary Arabic and Persian hands are notoriously difficult to decipher. These are some of the challenges:

a dearth of canonical dots, which renders many letter-shapes ambiguous; the sporadic dot phenomenon inspired this classic study.
scribes' reluctance to lift the pen, which created abusive ligatures, or strokes connecting letters that in standard Arabic and Persian writing should remain unconnected
Verschleifung (literally, “slurring”) pen-strokes so that letters are skipped or subsumed into other letters and an abusive ligature after the alif

Which additional skills do I need to study the IEDC documents?

Patience and spreadsheets.
Patience is essential because the texts haven't been studied in depth for long.
Good record-keeping is essential because there are thousands of documents in no particular order. If you have worked in an archive, you may have had the luxury of someone else creating order before your arrival. If you work with IEDC fragments, you're often assembling your own archives (or dossiers, which is the technical term to use when the material you're assembling wasn't actually archived). We use lots of spreadsheets.
Other technical skills one can expect to pick up include understanding how legal documents are structured, recognizing scribes by their handwriting, recognizing the names of coins and units of weights and measurement and learning the patterns of shelfmarks in dozens of library collections.

For more, check out our Research Tools page!

Is It Bamiyan Papers or Afghan Geniza(h)?

The National Library was quick to attach the term Geniza(h) to the caches of documents coming from Afghanistan, echoing the name of the well-known “Cairo Geniza”. However, in this case, there is no evidence that the documents came from a geniza, which is a Hebrew term that refers to storage places for disused papers attached to a synagogue. Rather, Invisible East researchers have been identifying many place names in the document that locate the papers in the Bamiyan area.

Are the texts unprovenanced, and if so, what are the ethical issues around working with these documents?

Many of the texts have unclear, undocumented provenance. This poses an ethical dilemma to researchers as we do not wish to give legitimacy or encourage the purchase of unprovenanced materials. In the case of Afghanistan, conflict and poverty have made it difficult for Afghanistan's authorities to regulate the transfer of cultural heritage or develop an advocacy platform. We hope that the international community will support the government of Afghanistan in developing an effective regulatory system as well as a broader safeguarding and preservation strategy. As for materials that are already in the public domain, it is important that the people of Afghanistan and the wider community of scholars and the general audience obtain access to them and learn about their content and historical interpretations: the IEDC is one such information sharing platform.