Thursday, December 7, 2017

Open science is not inherently interesting. Do it anyway.

tl;dr: Open science practices themselves don't make a study interesting. They are essential prerequisites whose absence can undermine a study's value.

There's a tension in discussions of open science, one that is also mirrored in my own research. What I really care about are the big questions of cognitive science: what makes people smart? how does language emerge? how do children develop? But in practice I spend quite a bit of my time doing meta-research on reproducibility and replicability. I often hear critics of open science – focusing on replication, but also other practices – objecting that open science advocates are making science more boring and decreasing the focus on theoretical progress (e.g., Locke, Strobe & Strack).  The thing is, I don't completely disagree. Open science is not inherently interesting.

Sometimes someone will tell me about a study and start the description by saying that it's pre-registered, with open materials and data. My initial response is "ho hum." I don't really care if a study is preregistered – unless I care about the study itself and suspect p-hacking. Then the only thing that can rescue the study is preregistration. Otherwise, I don't care about the study any more; I'm just frustrated by the wasted opportunity.

So here's the thing: Although being open can't make your study interesting, the failure to pursue open science practices can undermine the value of a study. This post is an attempt to justify this idea by giving an informal Bayesian analysis of what makes a study interesting and why transparency and openness is then the key to maximizing study value.

Friday, November 10, 2017

Talk on reproducibility and meta-science

I just gave a talk at UCSD on reproducibility and meta-science issues. The slides are posted here.  I focused somewhat on developmental psychology, but a number of the studies and recommendations are more general. It was lots of fun to chat with students and faculty, and many of my conversations focused on practical steps that people can take to move their research practice towards a more open, reproducible, and replicable workflow. Here are a few pointers:

Preregistration. Here's a blogpost from last year on my lab's decision to preregister everything. I also really like Nosek et al's Preregistration Revolution paper. is a great gateway to simple preregistration (guide).

Reproducible research. Here's a blogpost on why I advocate for using RMarkdown to write papers. The best package for doing this is papaja (pronounced "papaya"). If you don't use RMarkdown but do know R, here's a tutorial.

Data sharing. Just post it. The Open Science Framework is an obvious choice for file sharing. Some nice video tutorials make an easy way to get started.

Sunday, November 5, 2017

Co-work, not homework

Coordination is one of the biggest challenges of academic collaborations. You have two or more busy collaborators, working asynchronously on a project. Either the collaboration ping-pongs back and forth with quick responses but limited opportunity for deeper engagement or else one person digs in and really makes conceptual progress, but then has to wait an excruciating amount of time for collaborators to get engaged, understand the contribution, and respond themselves. What's more, there are major inefficiencies caused by having to load up the project back into memory each time you begin again. ("What was it we were trying to do here?")

The "homework" model in collaborative projects is sometimes necessary, but often inefficient. This default means that we meet to discuss and make decisions, then assign "homework" based on that discussion and make a meeting to review the work and make a further plan. The time increments of these meetings are usually 60 minutes, with the additional email overhead for scheduling. Given the amount of time I and the collaborators will actually spend on the homework the ratio of actual work time to meetings is sometimes not much better than 2:1 if there are many decisions to be made on a project – as in design, analytic, and writeup stages.* Of course if an individual has to do data collection or other time-consuming tasks between meetings, this model doesn't hold!

Increasingly, my solution is co-work. The idea is that collaborators schedule time to sit together and do the work – typically writing code or prose, occasionally making stimuli or other materials – either in person or online. This model means that when conceptual or presentational issues come up we can chat about them as they arise, rather than waiting to resolve them by email or in a subsequent meeting.** As a supervisor, I love this model because I get to see how the folks I work with are approaching a problem and what their typical workflow is. This observation can help me give process-level feedback as I learn how people organize their projects. I also often learn new coding tricks this way.***

Friday, October 6, 2017

Introducing childes-db: a flexible and reproducible interface to CHILDES

Note: childes-db is a project that is a collaboration between Alessandro Sanchez, Stephan Meylan, Mika Braginsky, Kyle MacDonald, Dan Yurovsky, and me; this blogpost was written jointly by the group.

For those of us who study child development – and especially language development – the Child Language Data Exchange System (CHILDES) is probably the single most important resource in the field. CHILDES is a corpus of transcripts of children, often talking with a parent or an experimenter, and it includes data from dozens of languages and hundreds of children. It’s a goldmine. CHILDES has also been around since way before the age of “big data”: it started with Brian MacWhinney and Catherine Snow photocopying transcripts (and then later running OCR to digitize them!). The field of language acquisition has been a leader in open data sharing largely thanks to Brian’s continued work on CHILDES.

Despite these strengths, using CHILDES can sometimes be challenging, especially for the most casual or most in-depth interactions. Simple analyses like estimating word frequencies can be done using CLAN – the major interface to the corpora – but these require more comfort with command-line interfaces and programming than can be expected in many classroom settings. On the other end of the spectrum, many of us who use CHILDES for in-depth computational studies like to read in the entire database, parse out many of the rich annotations, and get a set of flat text files. But doing this parsing correctly is complicated, and often small decisions in the data-processing pipeline can lead to different downstream results. Further, it can be very difficult to reconstruct a particular data prep in order to do a replication study. We've been frustrated several times when trying to reproduce others' modeling results on CHILDES, not knowing whether our implementation of their model was wrong or whether we were simply parsing the data differently.

To address these issues and generally promote the use of CHILDES in a broader set of research and education contexts, we’re introducing a project called childes-db. childes-db aims to provide both a visualization interface for common analyses and an application programming interface (API) for more in-depth investigation. For casual users, you can explore the data with Shiny apps, browser-based interactive graphs that supplement CHILDES’s online transcript browser. For more intensive users, you can get direct access to pre-parsed text data using our API: an R package called childesr, which allows users to subset the corpora and get processed text. The backend of all of this is a MySQL database that’s populated using a publicly-available – and hopefully definitive – CHILDES parser, to avoid some of the issues caused by different processing pipelines.

Thursday, July 6, 2017

What's the relationship between language and thought? The Optimal Semantic Expressivity Hypothesis

(This post came directly out of a conversation with Alex Carstensen. I'm writing a synthesis of others' work, but the core hypotheses here are mostly not my own.)

What is the relationship between language and thought? Do we think in language? Do people who speak different languages think about the world differently? Since my first exposure to cognitive science in college, I've been fascinated with the relationship between language and thought. I recently wrote about my experiences teaching about this topic. Since then I've been thinking more about how to connect the Whorfian literature – which typically investigates whether cross-linguistic differences in grammar and vocabulary result in differences in cognition – with work in semantic typology, pragmatics, language evolution, and conceptual development.

Each of these fields investigates questions about language and thought in different ways. By mapping cross-linguistic variation, typologists provide insight into the range of possible representations of thought – for example, Berlin & Kay's classic study of color naming across languages. Research in pragmatics describes the relationship between our internal semantic organization and what we actually communicate to one another, a relationship that can in turn lead to language evolution (see e.g., Box 4 of a review I wrote with Noah Goodman). And work on children's conceptual development can reveal effects of language on the emergence of concepts (e.g., as in classic work by Bowerman & Choi on learning to describe motion events in Korean vs. English).

All of these literatures provide their own take on the issue of language and thought, and the issue is further complicated by the many different semantic domains under investigation. Language and thought research has taken color as a central case study for the past fifty years, and there is also an extensive tradition of research on spatial cognition and navigation. But there are also more recent investigations of object categorization, number, theory of mind, kinship terms, and a whole host of other domains. And different domains provide more or less support to different hypothesized relationships. Color categorization seems to suggest a simple model where it's faster to categorize different colors because the words help with encoding and memory. In contrast, exact number may require much more in the way of conceptual induction, where children bootstrap wholly new concepts.

The Optimal Semantic Expressivity Hypothesis. Recently, a synthesis has begun to emerge that cuts across a number of these fields. Lots of people have contributed to this synthesis, but I associate it most with work by Terry Regier and collaborators (including Alex!), Dedre Gentner, and to a certain extent the tradition of language evolution research from Kenny Smith and Simon Kirby (also with a great and under-cited paper by Baddeley and Attewell).* This synthesis posits that languages have evolved over historical time to provide relatively optimal, discrete representations of particular semantic domains like color, number, or kinship. Let's call this the optimal semantic expressivity (OSE) hypothesis.** 

Thursday, June 15, 2017

N-best evaluation for hiring and promotion

How can we create incentive-compatible evaluation of scholarship? Here's a simple proposal, discussed around a year ago by Sanjay Srivastava and floated in a number of forms before that (e.g., here):
The N-Best Rule: Hiring and promotion committees should solicit a small number (N) of research products and read them carefully as their primary metric of evaluation for research outputs. 
I'm far from the first person to propose this rule, but I want to consider some implementational details and benefits that I haven't heard discussed previously. (And just to be clear, this is me describing an idea I think has promise – I'm not talking on behalf of anyone or any institution).

Why do we need a new policy for hiring and promotion? How do two conference papers on neural networks for language understanding compare with five experimental papers exploring bias in school settings or three infant studies on object categorization? Hiring and promotion in academic settings is an incredibly tricky business. (I'm focusing here on evaluation of research, rather than teaching, service, or other aspects of candidates' profiles.) How do we identify successful or potentially successful academics, given the vast differences in research focus and research production between individuals and areas? Two different records of scholarship simply aren't comparable in any sort of direct, objective manner. The value of any individual piece of work is inherently subjective, and the problem of subjective evaluation is only compounded when an entire record is being compared.

To address this issue, hiring and promotion committees typically turn to heuristics like publication or citation numbers, or journal prestige. These heuristics are widely recognized to promote perverse incentives. The most common, counting publications, leads to an incentive to do low-risk research and "salami slice" data (publish as many small papers on a dataset as you can, rather than combining work to make a more definitive contribution). Counting citations or H indices is not much better – these numbers are incomparable across fields, and they lead to incentives for self-citation and predatory citation practices (e.g., requesting citation in reviews). Assessing impact via journal ranks is at best a noisy heuristic and rewards repeated submissions to "glam" outlets. Because they do not encourage quality science, these perverse incentives have been implicated as a major factor in the ongoing replicability/reproducibility issues that are facing psychology and other fields.

Thursday, June 1, 2017

Confessions of an Associate Editor

For the last year and a half I've been an Associate Editor at the journal Cognition. I joined up because Cognition is the journal closest to my core interests; I've published nine papers there, more than in any other outlet by a long shot. Cognition has been important historically, and continues to publish recent influential papers as well. I was also excited about a new initiative by Steve Sloman (the EIC) to require authors to post raw data. Finally, I joined knowing that Cognition is currently an Elsevier journal. I – perhaps naively – hoped that like Glossa, Cognition could leave Elsevier (which has a very bad reputation, to say the least) and go open access. I'm stepping down as an AE in the fall because of family constraints and other commitments, and so I wanted to take the opportunity to reflect on the experience and some lessons I've learned.

Be kind to your local editor. Editing is hard work done by ordinary academics, and it's work they do over and above all the typical commitments of non-editor academics. I was (and am) slow as an editor, and I feel very guilty about it. The guilt over not moving faster has been the hardest aspect of the job; often when I am doing some other work, I will be distracted by my slipping editorial responsibilities.1 Yet if I keep on top of them I feel that I'm neglecting my lab or my teaching. As a result, I have major empathy now for other editors – and massive respect for the faster ones. Also, whenever someone complains about slow editors on twitter, my first thought is "cut them some slack!"

Make data open (and share code too, while you're at it)! I was excited by Sloman's initiative for data openness when I first read about it. I'm still excited about it: It's the right thing to do. Data sharing is prerequisite for ensuring the reproducibility of results in papers, and enables reuse of data for folks doing meta-analysis, computational modeling, and other forms of synthetic theoretical work. It's also very useful for replication – students in my graduate class do replications of published papers and often learn a tremendous amount about the paradigm and analyses of the original experiment by looking at posted data when they are available. But sharing data is not enough. Tom Hardwicke, a postdoc in my lab and in the METRICS center at Stanford, is currently doing a study of computational reproducibility of results published in Cognition – data are still coming in, but our first impression is that it's often difficult to reproduce the findings in a good number of papers based on the raw data and their written description of analyses. Cognition and other journals can do much more to facilitate posting of analytic code.

Open access is harder than it looks. I care deeply about open access – as both an ethical priority and a personal convenience. And the journal publishing model is broken. At the same time, my experiences have convinced me that it is no small thing to switch a major journal to a truly OA model. I could spend an entire blogpost on this issue alone (and maybe I will later), but the key issue here is money: where it comes from and where it goes. Running Cognition is a costly affair in its current form. There is an EIC, two senior AEs, and nine other AEs. All receive small but not insignificant stipends. There is also a part-time editorial assistant, and an editorial software platform. I don't know most of these costs, but my guess is that replicating this system as is – without any of the legal, marketing, and other infrastructure – would be minimally $150,000 USD/year (probably closer to 200k or more, depending on software).

Thursday, May 25, 2017

Language and thought: Shifting the axis of the Whorfian debate

A summary of the changing axes of the debate over effects of language on cognition. (Click to see larger).

This spring I've been teaching in Stanford's study abroad program in Santiago, Chile. It's been a wonderful experience to come back to a city where I was an exchange student, and to navigate the challenges of living in a different language again – this time with a family. My course here is called "Language and Thought" (syllabus), and it deals with the Whorfian question of the relationship between cognition and language. I proposed it because effects of language on thought are often high in the mind of people having to navigate life in a new language and culture, and my own interest in the topic came out of trying to learn to speak other languages.

The exact form of the question of language and thought is one part of the general controversy surrounding this topic. But in Whorf's own words, his question was
Are our own concepts of 'time,' 'space,' and 'matter' given in substantially the same form by experience to all men, or are they in part conditioned by the structure of particular languages? (Whorf, 1941)
This question has personal significance for me since I got my start in research working as an undergraduate RA for Lera Boroditsky on a project on cross-linguistic differences in color perception, and I later went on to study cross-linguistic differences in language for number as part of my PhD with Ted Gibson.

Wednesday, February 15, 2017

Damned if you do, damned if you don't

Here's a common puzzle that comes up all the time in discussions of replication in psychology. I call it the stimulus adaptation puzzle. Someone is doing an experiment with a population and they use a stimulus that they created to induce a psychological state of interest in that particular population. You would like to do a direct replication of their study, but you don't have access to that population. You have two options: 1) use the original stimulus with your population, or 2) create a new stimulus designed to induce the same psychological state in your population.

One example of this pattern comes from RPP, the study of 100 independent replications of psychology studies from 2008. Nosek and E. Gilbert blogged about one particular replication, in which the original study was run with Israelis and used as part of its cover story a description of a leave from a job, with one reason for the leave being military service. The replicators were faced with the choice of using the military service cover story in the US where their participants (UVA undergrads) mostly wouldn't have the same experience, or modifying to create a more population-suitable cover story. Their replication failed. D. Gilbert et al. then responded that the UVA modification, a leave due to a honeymoon, was probably responsible for the difference in findings. Leaving aside the other questions raised by the critique (which we responded to), let's think about the general stimulus adaptation issue.

If you use the original stimulus with a new population, it may be inappropriate or incongruous. So a failure to elicit the same effect is explicable that way. On the other hand, if you use a new stimulus, perhaps it is unmatched in some way and fails to elicit the intended state as well. In other words, in terms of cultural adaptation of stimuli for replication, you're damned if you do and damned if you don't. How do we address this issue?

Thursday, January 26, 2017

Paper submission checklist

It's getting to be CogSci submission time, and this year I am thinking more about trying to set uniform standards for submission. Following my previous post on onboarding, here's a pre-submission checklist that I'm encouraging folks in my lab to follow. Note that, as described in that post, all our papers are written in RStudio using R Markdown, so the paper should be a single document that compiles all analyses and figures into a single PDF. This process helps deal with much of the error-checking of results that used to be the bulk of my presubmission checking.

Paper writing*

  • Is the first paragraph engaging and clear to an outsider who doesn't know this subfield?
  • Are multiple alternative hypotheses stated clearly in the introduction and linked to supporting prior literature?
  • Does the paragraph before the first model/experiment clearly lay out the plan of the paper?
  • Does the abstract describe the main contribution of the paper in terms that are accessible to a broad audience?
  • Does the first paragraph of the general discussion clearly describe the contributions of the paper to someone who hasn't read the results in detail? 
  • Is there a statement of limitations on the work (even a few lines) in the general discussion?

Friday, January 20, 2017

How do you argue for diversity?

During the last couple of months I have been serving as a member of my department's diversity committee, charged with examining policies relating to diversity in graduate and faculty recruitment. I have always put a value on the personal diversity of the people I worked with. But until this experience, I hadn't thought about how unexamined my thinking on this topic was, and I hadn't explicitly tried to make the case for diversity in our student population. So I was unprepared for the complexity of this issue.* As it turns out, different people have tremendously different intuitions on how to – and whether you should – argue for diversity in an educational setting.

In this post, I want to enumerate some of the arguments for diversity I've collected. I also want to lay out some of the conflicting intuitions about these arguments that I have encountered. But since diversity is an incredibly polarizing issue, I also want to be sure to give a number of caveats. First, this blogpost is about the topic of other people’s responses to arguments for diversity; I’m not myself making any of these arguments here. I do personally care about diversity and personally find some of these arguments more and less compelling, but that’s not what I’m writing about. Second, all of this discussion is grounded in the particular case of understanding diversity in the student body of educational institutions (especially in graduate education). I don’t know enough about workplace issues to comment. Third, and somewhat obviously, I don’t speak for anyone but myself. This post doesn’t represent the views of Stanford, the Stanford psych department, or even the Stanford Psych diversity committee.

Tuesday, January 3, 2017


Reading twitter this morning I saw a nice tweet by Page Piccinini, on the topic of organizing project folders:
This is exactly what I do and ask my students to do, and I said so. I got the following thoughtful reply from my old friend Adam Abeles:
He's exactly right. I need some kind of onboarding guide. Since I'm going to have some new folks joining my lab soon, no time like the present. Here's a brief checklist for what to expect from a new project.