Royal Netherlands Institute of Southeast Asian and Caribbean Studies (KITLV)

The Royal Netherlands Institute of Southeast Asian and Caribbean Studies (KITLV) is a world-class research institute for the interdisciplinary and comparative study of Southeast Asia and the Caribbean. (Dis)continuities between the (pre)colonial and postcolonial period in a globalising world are of central interest. Most digital humanities work at KITLV is with texts that illuminate language, history, and contemporary politics.

  • The project Language of Popular Culture aims to develop a sociolinguistic history of the vernacular Sino-Malay language. It uses about 150,000 pages of digitised novels and newspapers in that language dating to between the 1870s and the 1930s.
  • Dutch Military Operations in Indonesia, 1945-1950, examines about 100,000 pages of ego-documents, written by almost 1400 Dutch army veterans on the basis of their experiences during Dutch operations against Indonesian revolutionary nationalists immediately following the Pacific War. Manual close reading has already led to one book. Now the aim is to apply text mining techniques for more quantitative analysis.
  • Elite Network Shifts is developing techniques to extract sociologically meaningful information from more than half a million Indonesian newspaper articles dating to the early 2000s. It extracts the names of elite individuals from those texts, and calculates the networks linking them to each other.Recording the Future, by contrast, works with audiovisual imagery. Since 2003 it has recorded 500 hours of ethnographic film in eight locations around Indonesia that have been visited repeatedly. This audiovisual collection is now being made available to researchers and educators through DANS and the Digital Study Centre at Leiden University.

Digital Humanities Research Agenda

The text-based projects all work with relatively large corpora of weakly structured natural language texts, sometimes in languages for which the computational tools remain underdeveloped (such as Indonesian, historical Sino-Malay). The objective of each of these projects is not only to make new discoveries through a ‘distant reading’ of its specific set of texts, but also to learn how to do this in ways that are more broadly applicable to future projects.

The techniques that at present speak most clearly to our researchers stay relatively close to the texts. Examples include search techniques adapted to social science research interests, concordances, extraction of named entities, and visualisation of entities (such as n-grams, or place names on maps).

Higher level techniques such as extracting time stamps (knowing when something mentioned in the text occurred), identifying relations between entities, or topic modelling, offer promising possibilities. Their complexity require software engineering support from outside agencies (also in view of the extensive pre-processing these techniques demand).

The audio-visual project at present is mainly looking at digital techniques for archiving, searching and retrieving the available material.

Contact

Prof dr Gerry van Klinken, klinken@kitlv.nl

Dr Tom Hoogervorst, hoogervorst@kitlv.nl

Central Themes

  • search techniques
  • audio-visual data
  • non-western language corpora
  • entity recognition

Relevant links