Babies Learning Language: Randomization on mechanical turk

Thursday, October 10, 2013

Randomization on mechanical turk

Amazon Mechanical Turk is a fabulous way to do online psychology experiments. There have a bunch of good tutorial papers showing why (e.g. here, here, and here). One issue that comes up frequently, though, is how to do random assignment to condition. Turk is all about allowing workers to do many HIT (human intelligence task, Turk's name for a work assignment) types, one after another. In contrast, most experimental psychologists want to make each condition of their experiment a single HIT and to get participants to do only one condition.

If you are using the web interface to Turk, you are creating a single HTML template, populated with different values for each distinct HIT type. That means that each different condition is a different HIT. In this case, if you want random assignment to (a single) condition, all you can do is write prominently "please do only one of these HITs." The problem is that Amazon displays HITs from the same job one after another, so you have to make sure that every worker stops after doing just one. This strategy generally works until some worker does 7 or 30 conditions of your experiment - messing up your randomization and putting you in the awkward position of paying for data you (typically) can't use. Nevertheless, I and many other people used the "do this HIT once" method for years - it's easy and doesn't go wrong too much if the instructions are clear enough.

In the last couple of years, though, folks in my lab have moved to using "external HITs" where we use Turk's Command Line Tools to direct workers to a single HTML/JavaScript-based HIT that can do all kinds of more interesting stuff, including having multiple screens, lots of embedded media, and a more complex control flow. The HTML/js workflow is generally great for this, and there is quite a bit of code floating around the web that can be reused for this purpose. Now there is only one underlying HIT, so workers can only complete it once.

The easiest way to do random assignment to condition from within a JavaScript HIT is to have the js assign condition completely at random for each participant. This just involves writing some randomization in the code for the experiment and makes things very simple. With 2 conditions and many participants, this works pretty well (maybe you get 48 in one condition and 52 in another), but with many conditions and fewer participants, it fails quite badly. (Imagine trying to get 5 conditions with 10 participants each. You might get 6, 14, 8, 4, and 18 subjects, respectively, which would not be optimal from the perspective of having equally precise measures about each condition.)

Our solution to this problem is as follows: We use a simple PHP script, the "maker getter," that is called with an experiment filename and a set of initial condition numbers (in the example below, it's "myexpt_v1" and conditions 1 and 2, each with 50 participants). The first time it's called, it sets up a filename for that experiment and populates the conditions. Every subsequent time it's called, it returns a condition. Then, if this is a true Turk worker (and not a test run), a separate script decrements the counts for that condition. This gives us true random assignment to condition.

(Note: Todd Gureckis's PsiTurk is a more substantial, more general way to solve this same problem and several others, but requires a bit more in the way of setup and infrastructure.)

---- DETAILS AND CODE ----

The JavaScript block for setting up and getting conditions:

// Condition - call the maker getter to get the cond variable
try {
var filename = "myexpt_v1"
var condCounts = "1,50;2,50"
var xmlHttp = null;
xmlHttp = new XMLHttpRequest();
xmlHttp.open( "GET", "http://website.com/cgi-bin/maker_getter.php?conds=" +
condCounts +"&filename=" + filename, false );
xmlHttp.send( null );
var cond = xmlHttp.responseText;
} catch (e) {
var cond = 1;
}

The JavaScript block for decrementing conditions:

// Decrement only if this is an actual turk worker!
if (turk.workerId.length > 0){
var xmlHttp = null;
xmlHttp = new XMLHttpRequest();
xmlHttp.open('GET',
'http://website.com/cgi-bin/' +
'decrementer.php?filename=' +
filename + "&to_decrement=" + cond, false);
xmlHttp.send(null);

}

maker_getter PHP script (courtesy of Stephan Meylan, now a grad student at Berkeley), which is running in the executable portion of your hosting space: maker_getter.php.

decrementer PHP script (also courtesy Stephan): decrementer.php.

10 comments:

huaguo wuFebruary 2, 2017 at 10:04 AM
Hi there, to conduct this randomization, do I need to run my experiment on an external server?
ReplyDelete
Replies
Michael FrankFebruary 2, 2017 at 10:14 AM
Yes, probably - you need to have something to run the PHP and to host the JavaScript that queries it.
ReplyDelete
Replies
huaguo wuFebruary 2, 2017 at 6:07 PM
I see. Thank you very much!!! So there is no way to realize even randomization if I only use the template provided by Turk?
ReplyDelete
Replies
TomMarch 20, 2017 at 7:49 AM
I tried this approach and it has an important limitation. If a worker previews or refreshes the HIT, s/he might receive another treatment (due to the possibility of a worker refreshing the page, disabling the preview would not suffice). In my study, this would introduce an important bias. Therefore, I extended this approach by storing all workers and their initial experimental group assignment in a MySQL database (with an index on WorkerID). Before a worker is shown a page, I check whether s/he is stored in the DB. If yes, I present him/her the same treatment as the initial treatment. If not, I present him/her a random treatment.

I also log every time a worker loads the page the time, his/her id, and the assigned group. This way, I can check which user refreshes or drops out.
ReplyDelete
Replies
TomMarch 20, 2017 at 7:49 AM
I tried this approach and it has an important limitation. If a worker previews or refreshes the HIT, s/he might receive another treatment (due to the possibility of a worker refreshing the page, disabling the preview would not suffice). In my study, this would introduce an important bias. Therefore, I extended this approach by storing all workers and their initial experimental group assignment in a MySQL database (with an index on WorkerID). Before a worker is shown a page, I check whether s/he is stored in the DB. If yes, I present him/her the same treatment as the initial treatment. If not, I present him/her a random treatment.

I also log every time a worker loads the page the time, his/her id, and the assigned group. This way, I can check which user refreshes or drops out.
ReplyDelete
Replies
Michael FrankMarch 20, 2017 at 11:09 AM
If you're interested in having a stronger backend, you might consider PsiTurk (https://psiturk.org/) - I deal with the issue you mentioned by blocking turkers from previewing beyond the instructions of my experiment unless they accept.
ReplyDelete
Replies

Add comment