Milestone 2 TeamInnovation2

From crowdresearch
Revision as of 22:24, 11 March 2015 by Crystalcalhoun (Talk | contribs) (Deliverable)

Jump to: navigation, search

[This page is still in progress.]

Attend a Panel to Hear from Workers and Requesters

Observations From the Panel

  • Work hours on MTurk are very variable between workers.
  • Work availability on MTurk is also very variable.
  • Some MTurk workers fit Turking into their existing schedules, and other workers make their schedules around MTurk. The ones who fit Turking into their schedules may make less money since they are not available as often to discover work as it is posted.
  • Client/requester work hours and worker work hours don't necessarily match up.
  • On both oDesk and MTurk, those requesting or posting work can invite specific individuals to do their work, but this seems easier on oDesk than MTurk.
  • On oDesk, workers need to convert their experience into skills that display well on the web site.
  • Beginning workers on oDesk may find it difficult to prove their skills to prospective employers.
  • Workers on oDesk who are invited to do work need to filter their invitations to decide which to accept.
  • Some workers on oDesk provide prospective clients with sample work when applying for a job.
  • On oDesk, beginning workers may have trouble getting work, because many jobs require previous experience within the oDesk system.
  • The amount of money workers make crowdsourcing is highly variable with the time of the year, the day of the week, etc. It is often influenced heavily by the academic schedule, as a lot of work is related to academics.
  • MTurk workers trying to find work they're interested use tags to search. Good tags help workers find good tasks.
  • Rejecting work on MTurk is very harmful to workers because it can severely limit their access to work. This causes some requesters to avoid rejecting work. It also means workers are very concerned with their approval ratings, and workers spend time and effort trying to boost their ratings by contacting clients or requesters.
  • Requesters who give rejections on MTurk receive a lot of feedback from dissatisfied workers, which takes a lot of time to deal with.
  • Workers' approval rating does not apply to specific tasks. Requesters can't tell, for instance, if a worker has a 99 percent approval rating in transcription tasks only, or if they have a 99 percent rating in categorization and a 0 percent rating in transcription.
  • There are many ways a requester can do attention and honesty checks. Many requesters include at least one of these checks in each HIT (for instance, asking a question with a previously verified answer).
  • Requesters don't always know who their target audience or worker is.
  • Researchers don't always know for sure that answers are truly coming from their target demographic.
  • Techniques for screening people for their demographic are nonstandardized across requesters.
  • Personalized tasks are difficult and complex to run on MTurk.
  • Requesters on MTurk use a wide variety of techniques to design their HITs effectively. Some of them use worker feedback on pilot tasks to guide the creation of the actual task.
  • Requesters prefer to find the right people and have them do a task well rather than having a task done by people and then having to determine whether the work was done properly.
  • It takes valuable time to create a test task, but it can also help requesters learn about their own tasks.
  • Sometimes putting together a non-crowdsourced solution takes as much time and effort as designing a way to crowdsource a given piece of work.
  • MTurk requesters get higher quality results by breaking large tasks down into smaller tasks, but oDesk clients are often looking for a smaller number of workers to perform work that makes sense if done by one person (like writing a single paper).
  • Requesters on MTurk are discouraged from rejecting work, thus high approval ratings are not necessarily indicative of high work quality.
  • There are many different types of qualifications and ratings requesters use to filter who can work on their tasks. Some filter the whole population, and some allow only hand-picked workers access to work. Some do help requesters get higher quality work, and others don't. Different requesters have different experiences with using qualifications to seek higher-quality work.
  • The Masters qualification on MTurk does not eliminate spam results, even though it costs more to use it.
  • One efficient way for MTurk requesters to get high quality results is to form a customized pool of qualified workers.
  • Requesters find it difficult to reach a mutual understanding of tasks with workers, sometimes because the requester doesn't fully understand the task yet, and sometimes because it is difficult to communicate what the task is and how to do it.
  • Requesters find it difficult to find skilled, honest workers.
  • Some requesters are not skilled in rating workers accurately, which skews ratings.
  • Requesters often get work from workers they do not want work from (for instance, a researcher looking for female respondents will get responses from males).
  • When a batch of HITs doesn't get completed, a requester might e-mail people or ask on one of the MTurk communities about pricing, then reprice and relaunch the task. Alternately, if the instructions are unclear they might need to be clarified. Sometimes there's a bug in MTurk that keeps workers from seeing a task.

Reading Others' Insights

Worker perspective: Being a Turker

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

Worker perspective: Turkopticon

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

Requester perspective: Crowdsourcing User Studies with Mechanical Turk

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

Requester perspective: The Need for Standardization in Crowdsourcing

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

Both perspectives: A Plea to Amazon: Fix Mechanical Turk

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

Do Needfinding by Browsing MTurk-related forums, blogs, Reddit, etc

List out the observations you made while doing your fieldwork. Links to examples (posts / threads) would be extremely helpful.

Synthesize the Needs You Found

List out your most salient and interesting needs for workers, and for requesters. Please back up each one with evidence: at least one observation, and ideally an interpretation as well.

Worker Needs

A set of bullet points summarizing the needs of workers.

  • Example: Workers need to be respected by their employers. Evidence: Sanjay said in the worker panel that he wrote an angry email to a requester who mass-rejected his work. Interpretation: this wasn't actually about the money; it was about the disregard for Sanjay's work ethic.

Requester Needs

A set of bullet points summarizing the needs of requesters.

  • Example: requesters need to trust the results they get from workers. Evidence: In this thread on Reddit (linked), a requester is struggling to know which results to use and which ones to reject or re-post for more data. Interpretation: it's actually quite difficult for requesters to know whether 1) a worker tried hard but the question was unclear or very difficult or an edge case, or 2) a worker wasn't really putting in a best effort.