The text-based projects all work with relatively large corpora of weakly structured natural language texts, sometimes in languages for which the computational tools remain underdeveloped (such as Indonesian, historical Sino-Malay). The objective of each of these projects is not only to make new discoveries through a ‘distant reading’ of its specific set of texts, but also to learn how to do this in ways that are more broadly applicable to future projects.
The techniques that at present speak most clearly to our researchers stay relatively close to the texts. Examples include search techniques adapted to social science research interests, concordances, extraction of named entities, and visualisation of entities (such as n-grams, or place names on maps).
Higher level techniques such as extracting time stamps (knowing when something mentioned in the text occurred), identifying relations between entities, or topic modelling, offer promising possibilities. Their complexity require software engineering support from outside agencies (also in view of the extensive pre-processing these techniques demand).
The audio-visual project at present is mainly looking at digital techniques for archiving, searching and retrieving the available material.