Friday, September 19, 2014

Probabilistic pragmatics bibliography

Pragmatics is the study of human communication in context. A tremendous amount of experimental and theoretical work has been done on pragmatics since Grice's seminal statement of the cooperative principle. In recent years, a number of people have been working on a new set of formal models of pragmatics, using probabilistic methods and approaches from game theory to quantify human pragmatic reasoning. 

This post is an incomplete bibliography of some of the recent work following this approach. My goal in compiling this bibliography is primarily personal: I want to keep track of this growing literature and the different branches it's taken. I've primarily included research that is either formal/computational in nature, or based directly on formal models. Please let me know in the comments or by email if you have work that you would like added here.
Probabilistic Models and Experimental Tests
One flaw in this literature is that right now there's no one good paper to look at for an intro. The first paper on this list is (IMO) a good introduction, but it's only a page long, so if you want details you have to look elsewhere. 
Game Theoretic Approaches
This section is a very incomplete list of some of the great work on this topic in the game theory tradition. Note, Michael Franke is someone different from me
Extensions to Other Phenomena
Many of these models have been applied primarily to reference resolution but many other linguistic phenomena seem amenable to the probabilistic pragmatics approach.
Connections to Language Acquisition
Connections with Pedagogy and Teaching
There are many interesting and as-yet-unexplored connections between pragmatics and teaching. 

Wednesday, September 10, 2014

Sharing research using RMarkdown

(An example of using R Markdown to do chunk-based analysis, from this tutorial.)

This last year has brought some very positive changes in the way my lab works with and shares data. As I've mentioned in previous posts (here and here), we have adopted the version control tool git and the site github for collaborating and sharing data both within the lab and outside it. I'm very pleased to report that nearly all of our publications for 2014 have code and data openly shared through github links.

In the course of using this ecosystem, however, I've come to think that it's still not perfect for collaboration. In particular, in order to view analysis results from a collaborator or student, I need to clone into the repository and run all of their analyses, regenerating their figures and working out what they were intending in their code. For simple projects, this isn't so bad. But for anything that requires a modicum of data analysis, it really doesn't work very well. For example, I shouldn't have to rerun all the data munging for an eye-tracking project on my local machine just to see the resulting graphs.

For that reason, we've started using R Markdown for writing our analyses and sharing them with collaborators. R Markdown is a method for writing chunks of code interspersed with simple formatted text. Plots, tables, etc. are inserted inline. This combo then can be rendered to HTML, PDF, or even Word formats. Here's a nice tutorial – the source of the sample image above. The basics are so simple, it should only take about 5 minutes to get started. And all of this can be done from within RStudio, which is a substantially better IDE than the basic Mac R interface.*

Using R Markdown, we write our analyses in a (relatively) comprehensible way, explaining and delineating sections as necessary. We then can compile these to HTML and share them using RPubs, a service that is currently integrated with the R Markdown functionality in RStudio. That way we can just send links to one another (and we can republish and update with new analyses as needed).

Overall, this workflow means that we have full version control over all of our analyses (via git), but also have a friendly way to share with time-crunched or less tech-savvy collaborators. And critically, the overhead to swap to this way of working has been almost nonexistent. Several of our students in the CSLI undergraduate summer internship program this summer completed projects where all their data analysis was done this way. No ecosystem is perfect, but this one is a nice balance between reproducibility and openness on the one hand and ease of use on the other.

* I can't help mentioning that it would be nice if the internal plotting window was a quartz window that could save vector PDFs. The quartz() workaround is very ugly when you are working in OS X full-screen mode.

** Right now, all RPubs documents are shared publicly, but that's not such a big deal if you're used to working in a primarily public ecosystem using github anyway.