Effective Mechanical Turk: The TurkPrime Blog: study

Hundreds of academic papers are published each year using data collected through Mechanical Turk. Researchers have gravitated to Mechanical Turk primarily because it provides high quality data quickly and affordably. However, Mechanical Turk has strengths and weaknesses as a platform for data collection. While Mechanical Turk has revolutionized data collection, it is by no means a perfect platform. Some of the major strengths and limitations of MTurk are summarized below.

Strengths

A source of quick and affordable data

Thousands of participants are looking for tasks on Mechanical Turk throughout the day, and can take your task with the click of a button. You can run a 10 minute survey with 100 participants for $1 each, and have all your data within the hour.

Data is reliable

Researchers have examined data quality on MTurk and have found that by and large, data are reliable, with participants performing on tasks in ways similar to more traditional samples. There is a useful reputation mechanism on MTurk, in which researchers can approve or reject the performance of workers on a given study. The reputation of each worker is based on the number of times their work was approved or rejected. Many researchers use a standard practice that relies on only using data from workers who have a 95% approval rating, thereby further ensuring high-quality data collection.

Participant pool is more representative compared to traditional subject pools

Traditional subject pools used in social science research are often samples that are convenient for researchers to obtain, such as undergraduates at a local university. Mechanical Turk has been shown to be more diverse, with participants who are closer to the U.S. population in terms of gender, age, race, education, and employment.

Limitations

There are two kinds of potential limitations on MTurk, technical limitations, and more fundamental limitations with the platform. Many of the technical limitations of MTurk have been resolved through scripts written by researchers or platforms such as TurkPrime, which help researchers do things they were not previously able to do on MTurk including

Exclude participants from a study based on participation in a previous study
Conduct longitudinal research
Make sure larger studies do not stall out after the first 500 to 1000 Workers
Communicate with many Workers at a time.

There are however several more fundamental limitations to data collection on MTurk:

Small population

There are about 100,000 Mechanical Turk workers who participate in academic studies each year. In any one month about 25,000 unique Mechanical Turk workers participate in online studies. These 25,000 workers participate in close to 600,000 monthly assignments. The more active workers complete hundreds of studies each month. The natural consequence of a small worker population is that participants are continuously recycled across research labs. This creates a problem of ‘non-naivete’. Most participants on Mechanical Turk have been exposed to common experimental manipulations and this can affect their performance. Although the effects of this exposure have not been fully examined, recent research indicates that this may be impacting effect sizes of experimental manipulations, comprising data quality and the effectiveness of experimental manipulations.

Diversity

Although Mechanical Turk workers are significantly more diverse than the undergraduate subject pool, the Mechanical Turk population is significantly less diverse than the general US population. The population of MTurk workers is significantly less politically diverse, more highly educated, younger, and less religious compared to the US population. This can complicate the way that data can be interpreted to be reliable on a population level.

Limited selective recruitment

Mechanical Turk has basic mechanisms to selectively recruit workers who have already been profiled. To accomplish this goal Mechanical Turk conducts profiling HITs that are continuously available for workers. However, Mechanical Turk is structured in such a way that it is much more difficult to recruit people based on characteristics that have not been profiled. For this reason while rudimentary selective recruitment mechanisms exist there are significant limitations on the ability to recruit specific segments of workers.

Solutions

TurkPrime offers researchers more specific selective recruitment opportunities, and has some features in development to help researchers target participants who are less active and therefore more naive to common experimental manipulations and survey measures. TurkPrime also offers access to PrimePanels, which has access to over 10 million participants, who can be selectively recruited, and are more diverse.

References:

Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior research methods, 46(1), 112-130.

Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples. Journal of Behavioral Decision Making, 26(3), 213-224.

Litman, L., Robinson, J., & Abberbock, T. (2017). TurkPrime. com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior research methods, 49(2), 433-442.

Litman, L., Robinson, J., & Rosenzweig, C. (2015). The relationship between motivation, monetary compensation, and data quality among US-and India-based workers on Mechanical Turk. Behavior research methods, 47(2), 519-528.

Paolacci, G., & Chandler, J. (2014). Inside the Turk: Understanding Mechanical Turk as a participant pool. Current Directions in Psychological Science, 23(3), 184-188.

Peer, E., Vosgerau, J., & Acquisti, A. (2014). Reputation as a sufficient condition for data quality on Amazon Mechanical Turk. Behavior research methods, 46(4), 1023-1031.

Many researchers set up their studies to use the TurkPrime AutoApprove feature so that they do not need to manually approve worker assignments based on the secret code that workers enter. On occasion, a researcher may set up his study incorrectly which results in many worker assignments getting automatically rejected. This was a significant cause for distress among the Mechanical Turk workers who received rejections for their work which then went unpaid and also lowered their MTurk approval rating. This was also a sore point for TurkPrime researchers who had to deal with upset workers and correspond with them, and often reverse their rejections

To alleviate this issue, TurkPrime will give researchers greater control of the rejection process when AutoApprove is used:

Only worker assignments that have the correct secret codes will be auto-approved while workers with invalid secret codes will require manual rejection. (If a rejection is made in error by a researcher, it may still be reversed.) This manual rejection is made using the same interface as is available for non-AutoApproved workers as shown below.

Researchers will see the secret codes the workers entered and whether they were correct. they will then have the option to approve, reject or leave their status undecided.

The manual rejection process must be completed within the Amazon Approval window (default of 7 days). Otherwise, Mechanical Turk will automatically approve the pending assignments that were not resolved buy that time.

Link To TurkPrime

Go to TurkPrime.com

Effective Mechanical Turk: The TurkPrime Blog

Pages

Monday, November 20, 2017

Strengths and Limitations of Mechanical Turk

Diversity

Limited selective recruitment

Friday, February 19, 2016

New Safety Feature: Assignment Rejections are not Automated and Require Manual User Action