Last week, the research
community was struck with concern that “bots” were contaminating data
collection on Amazon’s Mechanical Turk (MTurk). We wrote about the issue
and conducted our own preliminary investigation into the problem using the
TurkPrime database. In this blog, we introduce two new tools TurkPrime is
launching to help researchers combat suspicious activity on MTurk and reiterate
some of the important takeaways from this conversation so far.
TurkPrime’s Tools to
Deal with Suspicious Activity
As we announced last
week, we’ve created two new tools to help researchers fight fraud in their data
collection:
1. Block Suspicious Geolocations
2. Block Duplicate Geolocations
The Block Suspicious
Geolocations tool is a Free Feature that allows researchers to block
submissions from a list of suspicious geolocations. In our investigation last
week, we identified several geolocations that were responsible for a majority
of duplicate submissions. Our Block Suspicious Geolocations tool will prevent
any MTurk Worker from submitting a HIT from these locations. As mentioned in
last week’s blog, once we removed these locations from our analyses, we saw the
rate of duplicate submissions from the same geolocation across studies this
summer fell to 1.7%—a number well within the range of what we’ve identified as
normal across the life of our platform. The screenshot below shows our new
Block Suspicious Geolocations tool, found in Tab 6 “Worker Requirements” when
you design a study.
Our second tool, the Block
Duplicate Geolocations tool, is a Pro Feature that allows
researchers to block multiple submissions from any geolocation. The
Block Duplicate Geolocations tool casts a much wider net than the Block
Suspicious Geolocations tool and should ensure that responses collected in any
one survey come from a more distributed set of locations. By restricting the
number of submissions from each geolocation, researchers can be more confident
that the responses they collect are coming from unique participants. When using
this tool data collection may be a little slower, especially if the target
sample is concentrated in a small geographic area (e.g., one particular
state). The screenshot below shows our new Block Duplicate Geolocations Tool,
found in Tab 8 “Pro Features” when you design a study.
Moving Forward
Understanding what has
caused the recent increase in low quality responses on MTurk and the
corresponding increase in submissions from the same geolocation is a matter of
ongoing research. As we learn more details we will share them with the research
community and continue to develop tools that ensure the highest quality of
research data.
More immediately, we
have identified a list of worker IDs that have repeatedly been associated with
suspicious geolocations. In addition to the tools described above, we will
create an internal exclusion list based on the worker IDs of suspicious
accounts over the next several days. This exclusion list will create an
additional layer of protection on our system by blocking worker accounts that
have a high likelihood of being involved in fraud. We will write another blog
to provide more detail about this issue in the coming days. In the meantime,
however, researchers already have two powerful tools for eliminating fraud in
their data collection. These tools should increase researchers’ confidence that
they are obtaining genuine responses from unique workers.