Milestone 2 AltaMira
Alta Mira Milestone 2 Submission:
- 1 Attend a Panel to Hear from Workers and Requesters
- 2 Reading Others' Insights
- 3 Do Needfinding by Browsing MTurk-related forums, blogs, Reddit, etc
- 4 Synthesize the Needs You Found
Attend a Panel to Hear from Workers and Requesters
We attended Panel 2 at 6pm PST. here are some of our observations:
- A lot of the work (apparently up to 50%) comes from academic sources.
- The group of workers we were talking to were involved in academic studies and chose tasks in the higher bracket (not the lower $0.01 HITs)
- Workers work hours and type of work varies, many HITs come between 7am and 4pm, some workers prefer to work during their off time, some work during regular hours.
- Workers do not treat MTurk or freelancing only as a source of income. In a 1:1 converstaion, I was told you need to have strong financial skills to make freelancing a full time job. If you have a mortgage, this is probably not the best route.
- Workers hours are highy variable, sometimes they make a lot of money, sometimes they don't. They have to balance it out over weeks.
- Skills are not well measured on MTurk and they donot like the qualification system.
- Rejection percentages (whether they are employed by requestors or not) are a cause of concern/grief for workers.
- Requesters generally don't reject work due to a fear of backlash on forums, the high level of impact each rejection has on a worker. They are more likely to accept even bad work to avoid what happens if they reject.
- In the words of 2 long term requesters, rejection system in MTurk especially is broken, it is not a system that requesters use and it does not quantify what they expect from workers.
- Approval rate is one of the most commonly used qualifications for a HIT.
- Like the idea of micro tasks and using it to create a gold standard.
- Don't prefer Master's qualifications becuase it costs more, is hard to get and doesn't always offer better results.
- Finding the right person for a task is not congruent with how MTurk's system. The system does not have a good way to deliver the right workers.
- Regardless of the system, oDesk, MTurk all have very high user ratings, oDesk which counts rejection rates less severely still has an avg user rating of 4.5/5
Reading Others' Insights
Worker perspective: Being a Turker, TurkOpticon, Plea to Amazon
1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.
- Majority of Turkers have low income < $10k
- Small numbers of most active turkers do most of the tasks (3,011–8,582 of the 15,059 and 42,912 even though reported number is 500k)
- Pay is a significant motivating factor for turkers
- (Implied) Pay rates are low on MTurk so there could be other reasons people are on MTurk
- Turkers use TurkerNation to find out about the best and worst Requesters, this is also the most popular part of the site.
- Turkers primarily do HITs because they like them
- There is a significant difference between the motivations of US and Indian turkers. In the US turkers don't make that much, they make about minimum wage at best. In India, however, this would be a good income.
- A lot of turkers live hand to mouth and it is used as a source of "additional income" instead of primary income.
- Ethics matter for turkers, they want to work on tasks they are valued at that match their skills.
- Turkers use forums heavily to find the best requesters to work for (and also point out bad requesters), these forums tend to be a problem for reqeusters.
- Amazon's payment methods are a cause for concern on MTurk, the payment cycles allow employers 30 days to evaluate and pay for work.
- Generally MTurk is heavily biased towards requesters.
- Worker reputation system is broken (as mentioned above), it is too easy to game the system and penalties are too severe for low ratings.
- UI needs a reboot, more ways to sort HITs, categorize HITs, have a way of predicting completion times.
Requester perspective: Crowdsourcing User Studies with Mechanical Turk, The Need for Standardization in Crowdsourcing, Plea to Amazon
2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.
- Amazon deliberately favors Requesters over works on MTurk.
- Requesters don't have a reputation system they can manage on Amazon. Reputation is derived from forums like Turker Nation using tools like TurkOpticon.
- Requesters write positive reviews for themselves on forum sites.
- Requesters incur low costs in using MTurk, compared to small user studies which can cost much higher.
- Not knowing the user base is an advantage and disadvantage on MTurk.
- There is widespread gaming of the system with uninformative responses and having verifiable parts of tasks makes the system less susceptible to gaming.
- Response rates are very fast on MTurkf for taks that don't include verification.
- UI: It is too difficult to post tasks and sometimes requires hiring developers to get the task to show up correctly and has to be built from scratch.
- Repuatation system for workers is broken, everything that exists right now completion %, qualifications, approval rate are easy to game.
Many of the questions regarding a better crowdsourcing platform have already been posted by crowd researchers. These are some of the takeaways from their replies.
Reddit forum for requesters
- Need an easy to use way to figure out how much HITs will cost them (MTurk shows it to them in the final screen but surprisingly this question popped up many times)
- Need an easy way to estimate the body of work x-amount of money will buy eg below:
[–](redacted)[S] 1 point 6 months ago
You must pre-pay for your HITs Could you estimate how many surveys $500 would get me?
[–](redacted) 8 points 6 months ago
It depends how long the survey is expected to take (have several friends/coworkers unfamiliar with the survey test it for you to figure this out) and how fairly you pay for the workers' time. For example, if you decide the survey can be reasonably expected to take no more than 10 minutes to complete, and you decide to pay $0.15 per minute, then you would pay Turkers $1.50 per completed HIT assignment, and $0.15 for the 10% fee to MTurk, for a total of $1.65 per completed HIT assignment. $500 divided by $1.65 equals about 303, so with that budget, you could afford to pay 300 people to take your survey in this example.
- There is a general lack of understanding amongst new requesters about rejecting work and how to do it fairly.
- Need to know how much to pay for a task and the return they will get from attaching a qualification (other than the higher price) http://rh.reddit.com/r/mturk/comments/2clggw/i_am_a_requester_on_mturk_what_suggestions_do/cjgnmu8
Reddit forum for Workers
- Workers are unclear about how rejection will hurt them and feel browsing/going in and out of jobs carries a penalty. They need better rules and clearer explanations of MTurk. (found this post interesting it's from someone new but it shows the system has no good way of teaching rules: http://re.reddit.com/r/mturk/comments/2yorvw/guys_i_think_i_may_have_just_really_screwed/)
- Workers need to have a legitimate rejection percentage system for MTurk requesters. This seems to be a fairly strong ask from the forums (reddit and TurkerNation alike). TurkOpticon seems to be the primary tool for getting more information about requesters.
- Classification of HITs from a workers perspective, this is done on forums. There is a need to find good HITs, fun tasks that workers actually like doing.
- There is a need to make the interface clear so that workers don't redo the same task twice (especially in case of surveys)
- I found these needs from workers especially interesting, directed at academic researchers (http://www.mturkgrind.com/threads/guideline-for-academic-requesters-on-mturk.26327/). These responses were strikingly similar to responses on reddit and othe forums and follow the same guidelines: Pay fairly, be clear with rules, avoid unfair rejections, identify yourself, learn about your workers and provide fair time estimates.
Synthesize the Needs You Found
List out your most salient and interesting needs for workers, and for requesters. Please back up each one with evidence: at least one observation, and ideally an interpretation as well.
- Requester rejection rate and better way to handle rejections - Workers use third party tools and forums to get a better idea of requesters and their rejection rates. Evidence - posts on reddit, TurkerNation, tools like Turkopticon to rate and rank requesters. Interpretation: The system lets workers arbitrarily reject work and does nothing to consolidate worker complaints against requesters. They have to go to third party sites to get requester information and rejection rate. This is about trust.
- Fair system that values workers and requesters equally - Workers need a system that values them as workers. Evidence - Multiple complaints that Amazon values requesters more than workers, workers are treated like "computational units" and the system doesn't provide for 2 way feedback. Interpretation: This is about fairness and humanization, workers are real people and want to be treated as such. They want to have a place to talk about stuff when it's unfair and Amazon doesn't provide that right now.
- Better system for novices to get started - Bootstrapping new workers needs to be better, they won't have the qualifications, approval percentages that requesters want. Evidence - Denis said that it was hard to get started on ODesk, there was no way to have a history of successful projects that requesters use to hire workers. Interpretation - The system is unfriendly for new workers, this is becuase of the complex rules that are in place and used as filters to screen for workers which makes it harder for new workers to get started.
- Smarter qualifications - Workers skills vary and the system needs to account for it. Evidence - Nicole and other panelists said this, some prefer to do academic work, some do work they feel is interesting, even though they may have higher skills the current qualification system doesn't incentivize them properly. Interpretation - Pay based on skill and don't put everyone on the same level since some workers inherently have more skills that are not only mechanical (i.e I can type faster or do more reviews). Pay right now is the same for workers on a HIT, while some require custom qualifications, it is very easy to game it.
- Getting started has to be better - MTurk needs a better interface and system to get requesters started. Evidence - the sheer number of posts requesters post on "what not to do" on MTurk and articles about best practices, pay fairness, rejections. These topics while at the core of MTurk are not easily addressed by the system and need reliance on 3rd party forums to understand that not eveyrone will know about. As stated in multiple forums, hiring developers to develop surveys is not practical and the user interface is difficult for beginners to get started with.
- Better rejection system, current one is broken - Evidence - Dahl and the other requesters said many times that they don't reject tasks, this is common because the penalty to the workers is too severe. Interpretation - There is no legitimate way to qualify work and reject bad work, there is no good way to rate workers and as said in the "Plea to Amazon", a system in which workers cannot be rated is a system full of lemons. This is a basic need people in general have to rate and rank properly without feeling terrible repurcussions in the system or in outside forums.
- Better skill matching - Requesters deal with standard qualification tests, these are easy to game as mentioned in forums and supporting articles. Evidence - Panelists, Dahl saying that it is hard to understand worker's abilities from qualifications, they simply cost more but don't offer better results. Interpretation - The current system of tests is pointless if it can be gamed and it makes no sense if it requires answering the same questions in different formats. If someone has already done tasks that are similar, then they need to be able to pick those guys - i.e a programmer who understands HTML semantics probably understands XML semantics (same language structure), taking 5 different tests for 5 custom qualifications is not useful.