New Trends in eHumanities- sprakbanken

Markus Forsberg, University of Gothenburg

DATE

October 23, 2014. 15.00-17.00 at the eHumanities Group

A tour around the research infrastructure of Språkbanken

Språkbanken <http://spraakbanken.gu.se> is a language technology (LT) research group at University of Gothenburg that primarily works on LT analysis of large amounts of Swedish texts, both modern and historical. As part of this research, we have been working hard on creating a research infrastructure that I will present in this talk. This research infrastructure has been, until recently, focused on enabling language research, including LT research, but we are now trying to move towards supporting all kinds of HSS research that have Swedish texts as the primary source of data.

The research infrastructure consists of two main interconnected systems: the text (or corpora) infrastructure Korp <http://spraakbanken.gu.se/korp>, carrying close to ten billion words of Swedish texts, all analyzed using state-of-the-art LT tools, and the lexical infrastructure Karp <http://spraakbanken.gu.se/karp>, containing many kinds of Swedish dictionaries and lexicons, with a total of more than 700k lexical entries. It is free and open-source, so if you want to set up your own infrastructure, you can just download the software from Språkbanken’s homepage. As an example of this: If you prefer Finnish over Swedish, you can instead visit Helsinki’s Korp <https://korp.csc.fi>.

Markus Forsberg 
– Associate professor of natural language processing, University of Gothenburg
– Deputy director of Språkbanken
<http://spraakbanken.gu.se/eng/personal/markus>