<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.8.5">Jekyll</generator><link href="http://localhost:4000/feed.xml" rel="self" type="application/atom+xml" /><link href="http://localhost:4000/" rel="alternate" type="text/html" /><updated>2019-09-28T18:51:51+08:00</updated><id>http://localhost:4000/feed.xml</id><title type="html">Tim Niven (寒山)</title><subtitle>Information and ideas pertaining to my research and other interests.
</subtitle><entry><title type="html">Interview with Data Skeptic</title><link href="http://localhost:4000/nlp,/ai/2019/09/28/interview-with-data-skeptic.html" rel="alternate" type="text/html" title="Interview with Data Skeptic" /><published>2019-09-28T00:00:00+08:00</published><updated>2019-09-28T00:00:00+08:00</updated><id>http://localhost:4000/nlp,/ai/2019/09/28/interview-with-data-skeptic</id><content type="html" xml:base="http://localhost:4000/nlp,/ai/2019/09/28/interview-with-data-skeptic.html">&lt;p&gt;I was recently &lt;a href=&quot;https://podcasts.google.com/?feed=aHR0cDovL2RhdGFza2VwdGljLmNvbS9mZWVkLnJzcw&amp;amp;episode=MzdjYzU5NWM5YzNlNDJmMWEzOTM0OTAwODJhYzdhOWI&amp;amp;hl=en-TW&amp;amp;ep=6&amp;amp;at=1569665805799&quot;&gt;interviewed about our ACL 2019 paper on Data Skeptic.&lt;/a&gt;
Many thanks to the Data Skeptic team for inviting me to participate in
this interview, for asking some great questions, and for cutting the 
interview as nicely as I think possible given what raw materials I
provided.&lt;/p&gt;

&lt;p&gt;It was my first ever interview of this nature, and a learning experience
for me. I was only able to assign a limited amount of time to prepare
for this interview. This post is a brief note about what I wish I had
done better, and another statement of an argument I made about 
empricism and nativism in AI - especially topical in light of the release
of &lt;a href=&quot;https://www.bookdepository.com/Rebooting-AI-Gary-Marcus/9781524748258&quot;&gt;Gary Marcus and Ernest Davis’ new book&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the main, I should have referenced other researchers more.
A conversational style is nice and relaxed, but clarifying
 where ideas come from is required and non-negotiable.
Most of the ideas I presented are not original. For example, the idea 
that supervised learning with text
is facing a Chinese Room argument I first read in a paper by
&lt;a href=&quot;https://arxiv.org/abs/1610.07432&quot;&gt;Douwe Kiela&lt;/a&gt;. &lt;a href=&quot;https://www.youtube.com/attribution_link?a=aJ8aqmEmOb4&amp;amp;u=%2Fwatch%3Fv%3DIU9cQ1JdC7Y%26feature%3Dshare&amp;amp;fbclid=IwAR3zb8sPtuM7JKJkfei1nkEG3P8n01KH7262nSwBokj_-RTrlNeri5bida4&quot;&gt;Yoshua Bengio has also
been arguing for grounded language learning recently&lt;/a&gt;,
 and for being more willing to look at the results of cognitive science,
 and for the important of out-of-distribution generalization.&lt;/p&gt;

&lt;p&gt;As far as I know, perhaps the only original opinion expressed in the 
interview, beyond the results of our paper, was the counter-argument
offered by our work to an “Argument from engineering success” for the
kind of strongly empiricist program of people like Yann LeCun. I briefly
reiterate that argument here.&lt;/p&gt;

&lt;p&gt;I don’t know all the reasons why Yann LeCun sees innate structure as an 
“evil” to be minimized. However, watching
&lt;a href=&quot;https://www.youtube.com/watch?v=vdWPQ6iAkT4&quot;&gt;his debate with Gary Marcus&lt;/a&gt;,
among those reasons appears to
be what could be called an “argument from engineering success”:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;(1) The less innate structure we put into our models, the better they 
      have performed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;(2) Engineering success is a strong indication of the right path, 
      scientifically.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;(3) Therefore, less innate structure is better.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We of course need to define what we mean by innate structure. From the 
same debate, it appears LeCun and Marcus have different ideas of what 
this means. Marcus argues that NIPS papers roundly ignore innate 
structure; LeCun states exactly the opposite is the case. That’s a 
question I want to return to in the future.&lt;/p&gt;

&lt;p&gt;But for the time being, the growing number of findings in NLP that 
demonstrate our best deep learning models are learning spurious 
solutions to datasets via superficial statistics immediately undermines
(1), since this improved performance does not represent the kind of
learning we care about.&lt;/p&gt;

&lt;p&gt;Bengio (who I continue to admire greatly, not just for his scientific
achievements, which are amazing, but perhaps even more for how much of a 
high quality human being he is - see his work on AI for social good, and
his passionate advocacy for action on climate change) has also been 
taking &lt;a href=&quot;https://www.youtube.com/watch?v=llGG62fNN64&amp;amp;fbclid=IwAR2tX_GQX7ohJ92zJlm4_Fuj7-QEpZ0ggX-z_cu8reXZehacWm08KUbBWhM&quot;&gt;a view of recent
deep learning success as specifically “System 1” success&lt;/a&gt;,
 in terms of Kahneman’s “systems theory.” If this view is correct, then 
the argument from engineering success should be modified as follows&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;(1) The less innate structure we put into our models, the better they 
      have performed &lt;strong&gt;at system 1 tasks&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;(2) Engineering success is a strong indication of the right path, 
      scientifically.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;blockquote&gt;
  &lt;p&gt;(3) Therefore, less innate structure is better &lt;strong&gt;for system 1 tasks&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This argument is at least more reasonable given the growing evidence I 
referred to in my counter-argument, although I am not prepared to 
judge it at this point in time.&lt;/p&gt;

&lt;p&gt;As for system 2 tasks, we will have to wait and see. But at the very
least there is a reasonable case for taking the results of cognitive
science, and &lt;a href=&quot;https://philosophy.dept.shef.ac.uk/papers/Defense.pdf&quot;&gt;contemporary nativism&lt;/a&gt;, seriously.&lt;/p&gt;</content><author><name></name></author><summary type="html">I was recently interviewed about our ACL 2019 paper on Data Skeptic. Many thanks to the Data Skeptic team for inviting me to participate in this interview, for asking some great questions, and for cutting the interview as nicely as I think possible given what raw materials I provided.</summary></entry><entry><title type="html">Does the brain represent words?</title><link href="http://localhost:4000/representations,/cogsci,/nlp/2019/06/24/does-the-brain-represent-words.html" rel="alternate" type="text/html" title="Does the brain represent words?" /><published>2019-06-24T00:00:00+08:00</published><updated>2019-06-24T00:00:00+08:00</updated><id>http://localhost:4000/representations,/cogsci,/nlp/2019/06/24/does-the-brain-represent-words</id><content type="html" xml:base="http://localhost:4000/representations,/cogsci,/nlp/2019/06/24/does-the-brain-represent-words.html">&lt;p&gt;The &lt;a href=&quot;https://arxiv.org/abs/1806.00591&quot;&gt;paper&lt;/a&gt; is by Jon Gauthier and Anna 
Ivanova, and is from June 2018.&lt;/p&gt;

&lt;p&gt;My interest in this paper comes from the claim made therein that work in NLP on 
universal representations appears to be on the right track.&lt;/p&gt;

&lt;h2 id=&quot;brief-summary-of-the-paper&quot;&gt;Brief Summary of the Paper&lt;/h2&gt;

&lt;p&gt;The seminal work of Mitchell et al. (2008) used a trillion word 
corpus to define semantic representations on words based on co-occurrence with 
a specifically chosen set of 25 sensorimotor verbs thought to be associated 
with semantic representation: see, hear, listen, taste… they found this was 
significantly useful to predict neural activation patterns over nine subjects.&lt;/p&gt;

&lt;p&gt;As an aside, I thought one of the more interesting parts of that paper was the 
suggestion that a neural representation could be obtained as a linear 
superposition of representations from sub-modules.&lt;/p&gt;

&lt;p&gt;Following this, other researchers have been looking for better feature spaces, 
based on, e.g. behavioural ratings and distributional statistics. An extension 
has also been made to sentence decoding.&lt;/p&gt;

&lt;p&gt;A decoding study is described as follows&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;goal: derive a set of stimulus-specific linguistic features and measure how 
it is associated with brain activity&lt;/li&gt;
  &lt;li&gt;method: see if the brain activity patterns can predict the chosen features&lt;/li&gt;
  &lt;li&gt;conclusion: if the features reflect semantic properties of the stimulus, then 
the brain activity pattern is considered a “semantic representation”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The authors of this paper argue for the claim that&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;such talk of representation is meaningless unless one also specifies the 
brain mechanisms utilizing those representations and the task they are designed 
to solve.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;since such representational claims&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;wildly over-generate, leading us to award the label of “representation” to 
brain activity evoked by any arbitrary aspect of the stimulus, so long as it 
has some vague relation to the stimulus “meaning”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A specific example of “over-generation”: the study of Pereira et al. (2018), 
wherein fMRI data from subjects reading a sentence was used to predict the 
embeddings of the words in that sentence, claimed that their decoder could 
read out “linguistic meaning”.&lt;/p&gt;

&lt;p&gt;But since word embeddings have been shown at best to capture a limited range 
of things such as “elements of syntax” and “hypernymy relations”&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;we could just as well claim that the decoder has captured “elements of 
syntax” or “hypernymy relations.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;and since we do more than reason about syntax and hypernymy relations when 
reading a sentence, this underdetermines the the nature and function of neural 
computations.&lt;/p&gt;

&lt;p&gt;Furthermore, representations do not exist in a vacuum: they are created by some
part of the brain to be potentially consumed by another part and produce
behaviour. (Some interesting references to philosophical work I would like to
read on this point: Papineau, 1992; Dretske, 1995).&lt;/p&gt;

&lt;p&gt;So, the authors re-run the experiments of Pereira et al. (2018) by learning a 
decoder that maps the fMRI data to the neural representations from models 
trained to perform specific tasks.&lt;/p&gt;

&lt;p&gt;All neural models perform above chance, and the best performance is achieved 
by those that are more general (e.g. GloVe and NLI).&lt;/p&gt;

&lt;p&gt;These results are where the suggestion comes from - the more general NLP model 
the better the fMRIs can predict its representations.&lt;/p&gt;</content><author><name></name></author><summary type="html">The paper is by Jon Gauthier and Anna Ivanova, and is from June 2018.</summary></entry></feed>