Interview with Data Skeptic

2019-09-28T00:00:00+08:00

I was recently interviewed about our ACL 2019 paper on Data Skeptic. Many thanks to the Data Skeptic team for inviting me to participate in this interview, for asking some great questions, and for cutting the interview as nicely as I think possible given what raw materials I provided.

It was my first ever interview of this nature, and a learning experience for me. I was only able to assign a limited amount of time to prepare for this interview. This post is a brief note about what I wish I had done better, and another statement of an argument I made about empricism and nativism in AI - especially topical in light of the release of Gary Marcus and Ernest Davis’ new book.

In the main, I should have referenced other researchers more. A conversational style is nice and relaxed, but clarifying where ideas come from is required and non-negotiable. Most of the ideas I presented are not original. For example, the idea that supervised learning with text is facing a Chinese Room argument I first read in a paper by Douwe Kiela. Yoshua Bengio has also been arguing for grounded language learning recently, and for being more willing to look at the results of cognitive science, and for the important of out-of-distribution generalization.

As far as I know, perhaps the only original opinion expressed in the interview, beyond the results of our paper, was the counter-argument offered by our work to an “Argument from engineering success” for the kind of strongly empiricist program of people like Yann LeCun. I briefly reiterate that argument here.

I don’t know all the reasons why Yann LeCun sees innate structure as an “evil” to be minimized. However, watching his debate with Gary Marcus, among those reasons appears to be what could be called an “argument from engineering success”:

(1) The less innate structure we put into our models, the better they have performed.

(2) Engineering success is a strong indication of the right path, scientifically.

(3) Therefore, less innate structure is better.

We of course need to define what we mean by innate structure. From the same debate, it appears LeCun and Marcus have different ideas of what this means. Marcus argues that NIPS papers roundly ignore innate structure; LeCun states exactly the opposite is the case. That’s a question I want to return to in the future.

But for the time being, the growing number of findings in NLP that demonstrate our best deep learning models are learning spurious solutions to datasets via superficial statistics immediately undermines (1), since this improved performance does not represent the kind of learning we care about.

Bengio (who I continue to admire greatly, not just for his scientific achievements, which are amazing, but perhaps even more for how much of a high quality human being he is - see his work on AI for social good, and his passionate advocacy for action on climate change) has also been taking a view of recent deep learning success as specifically “System 1” success, in terms of Kahneman’s “systems theory.” If this view is correct, then the argument from engineering success should be modified as follows

(1) The less innate structure we put into our models, the better they have performed at system 1 tasks.

(2) Engineering success is a strong indication of the right path, scientifically.

(3) Therefore, less innate structure is better for system 1 tasks.

This argument is at least more reasonable given the growing evidence I referred to in my counter-argument, although I am not prepared to judge it at this point in time.

As for system 2 tasks, we will have to wait and see. But at the very least there is a reasonable case for taking the results of cognitive science, and contemporary nativism, seriously.

Does the brain represent words?

2019-06-24T00:00:00+08:00

The paper is by Jon Gauthier and Anna Ivanova, and is from June 2018.

My interest in this paper comes from the claim made therein that work in NLP on universal representations appears to be on the right track.

Brief Summary of the Paper

The seminal work of Mitchell et al. (2008) used a trillion word corpus to define semantic representations on words based on co-occurrence with a specifically chosen set of 25 sensorimotor verbs thought to be associated with semantic representation: see, hear, listen, taste… they found this was significantly useful to predict neural activation patterns over nine subjects.

As an aside, I thought one of the more interesting parts of that paper was the suggestion that a neural representation could be obtained as a linear superposition of representations from sub-modules.

Following this, other researchers have been looking for better feature spaces, based on, e.g. behavioural ratings and distributional statistics. An extension has also been made to sentence decoding.

A decoding study is described as follows

goal: derive a set of stimulus-specific linguistic features and measure how it is associated with brain activity
method: see if the brain activity patterns can predict the chosen features
conclusion: if the features reflect semantic properties of the stimulus, then the brain activity pattern is considered a “semantic representation”

The authors of this paper argue for the claim that

such talk of representation is meaningless unless one also specifies the brain mechanisms utilizing those representations and the task they are designed to solve.

since such representational claims

wildly over-generate, leading us to award the label of “representation” to brain activity evoked by any arbitrary aspect of the stimulus, so long as it has some vague relation to the stimulus “meaning”

A specific example of “over-generation”: the study of Pereira et al. (2018), wherein fMRI data from subjects reading a sentence was used to predict the embeddings of the words in that sentence, claimed that their decoder could read out “linguistic meaning”.

But since word embeddings have been shown at best to capture a limited range of things such as “elements of syntax” and “hypernymy relations”

we could just as well claim that the decoder has captured “elements of syntax” or “hypernymy relations.”

and since we do more than reason about syntax and hypernymy relations when reading a sentence, this underdetermines the the nature and function of neural computations.

Furthermore, representations do not exist in a vacuum: they are created by some part of the brain to be potentially consumed by another part and produce behaviour. (Some interesting references to philosophical work I would like to read on this point: Papineau, 1992; Dretske, 1995).

So, the authors re-run the experiments of Pereira et al. (2018) by learning a decoder that maps the fMRI data to the neural representations from models trained to perform specific tasks.

All neural models perform above chance, and the best performance is achieved by those that are more general (e.g. GloVe and NLI).

These results are where the suggestion comes from - the more general NLP model the better the fMRIs can predict its representations.

Tim Niven (寒山)

Interview with Data Skeptic

Does the brain represent words?

Brief Summary of the Paper