MPI for Psycholinguistics

Lecture in the series ‘e-Humanities in action’: Marco Baroni, University of Trento


February 12, 2014


Our next lecture in the series ‘e-Humanities in action’ will be given by Marco Baroni from University of Trento. His main research topic is distributional semantics. He is exploring the idea that human conceptual (semantic) knowledge is, to a considerable extent, the result of the extraction of simple distributional information from large amounts of linguistic input. I.e., that we think of barking, having a tail and being a pet as salient properties of dogs because we heard/read lots of sentences/phrases such as “her dog barks all the time”, “the dog wagged its tail” or “dogs and other pets”.

Marco is also an excellent speaker and his lectures and presentations are always very well received by the audience.

Who: Marco Baroni

What: Linking vectors to the world: Multimodal and cross-modal distributional semantics

Where: MPI for Psycholinguistics, Midi-Planck

When: Wednesday, February 12th, 14:30

Distributional semantic models (DSMs) capture various aspects of word meaning with vectors that summarize their patterns of co-occurrence in large text corpora, under the assumption that the contexts in which words occur are extremely informative about their meaning. DSMs are probably the most empirically successful, fully data-based computational approach to modeling meaning.

However, current DSMs account for meaning entirely in terms of linguistic signs (the meaning of a word is a summary of the linguistic contexts in which the word occurs). This leads to two conceptual problems: lack of grounding and lack of reference. Concerning the former, cognitive scientists have shown that meaning is strongly grounded in the sensory-motor system, so a semantic theory that completely dissociates meaning from perception and action is, a priori, a rather implausible model of how humans work. Lack of reference is an even more serious problem. A theory that has no way to connect semantic representations of linguistic expressions to the outside world (is the statement that “there is a lion in this room” true right now and right here?) is clearly missing something fundamental about what semantics is about!


Interestingly, in contemporary computer vision, images are represented by vectors recording the distribution of the set of discrete visual features they contain — a representation that is quite compatible with the one that DSMs assume for words.  This suggests that we can establish a connection with the visual world by means of such vector-based image-representation techniques, in keeping with the fully inductive, natural-data-based approach of DSMs.


In my talk, I will present a series of experiments with multimodal DSMs that tackle the grounding problem by means of richer semantic representations that combine linguistic and visual features, and discuss recent work in which we translate images vectors across modalities to linguistic representations, as a first step towards solving the problem of linking language to the visual world.