1. Max Kemman (Erasmus University Rotterdam) and Laura Hollink (VU University Amsterdam)
Building the PoliMedia search system; data- and user-driven
Analysing media coverage across several types of media-outlets is a challenging task for (media) historians. A specific example of media coverage research investigates the coverage of political debates and how the representation of topics and people change over time. The PoliMedia project (http://www.polimedia.nl) aims to showcase the potential of cross-media analysis for research in the humanities, by 1) curating automatically detected semantic links between four data sets of different media types, and 2) developing a demonstrator application that allows researchers to deploy such an interlinked collection for quantitative and qualitative analysis of media coverage of debates in the Dutch parliament.
These two goals reflect the two perspectives on the development of a search system such as PoliMedia; data- and user-driven. In this presentation, Laura Hollink (VU) will present the data-driven perspective of linking between different datasets and the research questions that arise in achieving this linkage: how to combine different types of datasets and what kind of research questions are made possible by the data? Max Kemman (EUR) will present the user-driven perspective: which benefits can scholars have from linking of these datasets? What are the user requirements for the PoliMedia search system and how was the system evaluated with scholars in an eye tracking study?
Max Kemman is a junior researcher at Erasmus University Rotterdam, affiliated with the Erasmus Studio, specialised in the use of academic digital search systems. His main interests are the use of these systems and how they can be improved to enhance scholarly practices. His research mainly focusses on user requirements and evaluation of usability of prototypes developed. He conducts this research within research projects developing search systems for several types of datasets; audiovisual archives (AXES, FP7, http://www.axes-project.eu), Oral History collections (Oral History Today, CLARIAH) and political debates with media coverage (PoliMedia, CLARIN).
Laura Hollink is assistant professor in the Knowledge Representation and Reasoning group of VU University Amsterdam. Since 2003 she has worked in a series of interdisciplinary projects on modeling, linking and enrichment of data, together with political and communication scientists, historians, cultural heritage and health professionals. In the PoliMedia project, her focus is on modeling political data, and linking this to media archives as well as publicly available Linked Open Data sources on the Web. Laura is part of the Ontology Alignment Evaluation Initiative, a major community effor for systematic evaluation of approaches to link datasets together, and is organiser of the USEWOD workshop series on the analysis of usage patterns in Linked Open Data. Currently, Laura is work package leader in the European FP7 project EURECA with a focus on integration of clinical care and clinical research data.
2. Maciej Eder (Pedagogical University in Kraków, eHg visiting fellow)
Authorship attribution and beyond: techniques of assessing the literary style
In literary stylometry, which assesses literary texts using statistical methods, it is still a mystery why multidimensional analyses of word distributions are such an accurate a tool for authorship attribution: why a comparison of normalized frequencies of words from the top frequency range in a collection of texts is enough to group together these texts by individual authors. However, while the authorial signal is usually by far the strongest, skewing of varying degree has been observed towards signals of authorial gender, genre, sentiment, or chronology. These “interfering” signals — usually considered as unwanted noise — are very valuable from literary perspective, though. Discovering the mechanisms and the underlying patterns, or, in other words, finding and separating the other-than-authorial signals in stylometric images of literary texts, might provide an insight into hidden regularities of literary creation usually ignored or untraceable by traditional methods of stylistic analysis.
The presentation will discuss some of the tools and techniques used in stylometric authorship attribution, with special attention paid to the question whether the methodology suitable for attribution can be generalized into other issues of computational stylistics, such as gender differentiation or genre recognition. From a purely literary perspective, it will be also interesting to present some case studies assessing various text collections: a corpus of Ancient Latin classical prose texts, a collection of 19th-century English fiction, etc.
Maciej Eder, visiting fellow of the eHumanities group for 3 months started in April, is Assistant Professor at the Institute of Polish Studies at the Pedagogical University of Krakow, Poland, and at the Institute of Polish Language at the Polish Academy of Sciences, Krakow, Poland. At the latter institution, he works as a lexicographer co-editing the Old-Polish Dictionary (i.e., a complete dictionary of Polish medieval language up to 1500). At the University, he teaches courses in early Polish literature, scholarly editing, and, occasionally, in computational stylistics.
Eder is interested in European literature of the Renaissance and the Baroque, classical heritage in early modern literature, and scholarly editing (his most recent book is a critical bilingual edition of Andreas Volanus’ Latin treatise De libertate politica…, 1572, and its old-Polish translation: O wolności rzeczypospolitej…, 1606). A couple of years ago while doing research on anonymous ancient texts, Eder discovered the fascinating world of computer-based stylometry and non-traditional authorship attribution. His work is now focused on a thorough re-examination of current attribution methods and applying them to non-English languages, e.g. Latin and Ancient Greek.