“Why linguists are needed: The severe limitations of big data analysis of linguistic corpora”

George Lakoff
University of California, Berkeley

The Berkeley MetaNet Project was funded for three years by IARPA, the Intelligence branch of the U.S. Department of Defense, on an open source basis. IARPA wanted a completely automated machine learning approach to analyzing the conceptual metaphors in their vast corpora of documents. Luckily, they also put together an ace team of Berkeley linguists and psycholinguists from California campuses.

This talk will go over why the big data statistical methods by themselves were hopeless. The Linguistics Group, on the other hand, used computational methods to set up a wiki database of many hundreds of conceptual metaphor mappings, over a hundred frames, many dozens of image schemas, and a very simple embodied construction grammar (ECG) parser incorporating Karen Sullivan’s insights on the way conceptual metaphor functions in grammar.

We did find the ability to process large corpora extremely useful. We also found that if you took the corpora processing input and applied even a simple metaphorical ECG parser and links to the cascades of relationships in the wiki database, we began to get some interesting analyses. But it took a great team of linguists — and lots of serious linguistic research — to get even reasonable partial analyses at all.

The talk will discuss details to give you a feel for why linguists are needed for serious analyses of linguistic data.

George Lakoff is Richard and Rhoda Goldman Distinguished Professor of Cognitive Science and Linguistics at the University of California, Berkeley. He is one of the founders of conceptual metaphor theory and of the field of cognitive linguistics. His numerous books and articles concern the “metaphors we live by”, the nature of conceptual categories and frames, and how both of these structure abstract domains such as mathematics, philosophy and political reasoning. The present lecture relates to his research at the International Computer Science Institute and UC Berkeley on embodied cognition and the Neural Theory of Language.

This lecture was organized by the Language Use and Cognition chair group of the Faculty of Humanities and was sponsored by the Network Institute <http://networkinstitute.org> and the Spinoza Prize project “Understanding Language by Machines” in the Computational Lexicology & Terminology Lab <http://www.cltl.nl/> of the Faculty of Humanities.

Please note the change of venue (because of large number of registrations for the event): Coffee and tea will be served before the lecture from 13.30h onwards on the 12th Floor (Main Building, VU University), the lecture will be given in HG12A-00. (also 12th floor).

Those interested in attending should register in advance by emailinggraduate.school.fgw@vu.nl.
The event is hosted by the Graduate School of the Humanities Faculty, with sponsorship from the Network Institute and the Spinoza Prize project “Understanding Language by Machines” in the Computational Lexicology & Terminology Lab.