WinterMilestone 1 @anotherhuman

From crowdresearch
Jump to: navigation, search

Template for your submission for Winter Milestone 1: @AnotherHuman.

Experience the life of a Worker on Mechanical Turk

So this is Turk from a worker's perspective, one who aims to earn $1.


Turk opens as a place for potential workers to have many opportunities to earn money and it is fairly upfront about it. The idea is presented as work on a HIT (human intelligence tasks) and earn money -- it says nothing about the maze of what is to come for the worker with an earnings goal.


The workflow follows 3 steps, which have been presented in such a way as to appear easy to complete. You find it, do some work (not complete it), and get paid. As I looked on, one sees the reward, the time constraint, and hits available. At first I was not completely sure what was meant by "HITs available". Partially my goal made me ignore the HIT expiration date, since it did not seem to matter directly with my goal. The examples of the types of tasks that MTurk is used seemed easy enough.


But then I started to see the real thing, marked with visual alarms of problems that contradicted the steps presented in image 2. This was the moment I realized Turk is a hairball of something, of what I learned the hard way. And why are the task boxes different colors?

Reflect on your experience as a worker on Mechanical Turk. What did you like? What did you dislike?

+SEEMINGLY SIMPLE EXECUTIONTurk is a very clearly articulated marketplace from Amazon's perspective.

-Deepening commitments per HIT: Turk did quite well establishing what the marketplace did. However, it includes as of right now 2 to 3 forms of qualifications and requirements that are dizzying for the pennies offered per hit and time consumed to go through it all. -Different paths to press the accept button: Requestors are able to shape the real workflow that workers experience

required test
non-disclosure agreements
the NDA is also a terms and conditions contract of 2,413 words across 9 pages
general guidelines to earn a penny, and this is abbreviated.
an additional test option
underlined warnings discouraging behaviors and external link

-A HIT IS NOT JUST A HIT: This relates to requester qualification programs for preferences. One requestor had a bronze, silver and gold system for hit approval after hit acceptance. Hit approval followed a scale system of 100 completed HITs at 95% approval rating and 2000 completed HITs at 95% for Gold.


-TURK-REQUESTOR INTEROPERABILITY: Once a worker accepts a HIT, they encounter a requestor's system. I guess the interoperability between Amazon and the requestor are a problem, since requestors can have their own external systems beyond Amazon.


+Social Improvement: Workers ar able to help out minority and disadvantaged groups, which is the kind of stuff people volunteer for.

+PRIVACY: Worker's privacy when doing surveys is really strong because Amazon codifies the submitter's contact information before it gets to the research requestor

-TURK-REQUESTOR INTEROPERABILITY: A the privacy statement for surveys is not consistent. some requesters use IRB level stuff, which is great, meanwhile others will not.


+HUMOR as a CHECK:humor in the surveys to check if the person is alive. these surveys killed me when they were too long. I really needed the joke.


-Requestor Power: Turk limits functionality within their system, but I had a requestor who installed software on my laptop in order to complete the HIT.

-Evolved to Pay MTurk: I found myself playing the system to discover how to make money quickly. All of the unplanned contracts and tests pushed me to avoid all the sales "words".


+The Game: Drop down menu to sort through the HITs by value

+Task Deconstruction (in a way): Turk breaks down surveys to one question and one response is a HIT

-HIT Manipulation: Requestors are able to introduce outside surveys to bypass Turk's HIT definition which potentially results in less pay for the worker

-The Single Requestor Penalty: Worker reputation might easily be punished by one reuqestor, which makes it impossible to meet qualification requirements for other tasks with other requestors

Experience the life of a Requester on Mechanical Turk performing Sentiment Analysis

Reflect on your experience as a requester on Mechanical Turk. What did you like? What did you dislike? Also attach the CSV file generated when you download the HIT results.

+Qualification: a lot of functionality to qualify workers

-TURK Requestor Workarounds: had to devise a mechanism to verify results

+HIT Decomposition: HITs (questions, not the whole analysis: see worker attachment) are broken down and given to individual workers

-HIT Decomposition for > 1 question surveys: the sentiment analysis is invalid because the individual responses went to individual workers, not the entire SA went to one worker

+Rapid Response: the task was completed quickly

+Worker Codification: can verify off site workers with those inside Amazon

+Selective Worker Bonuses: can give bonuses to different workers

+Weeding out workers: easy able to weed out poor workers from good ones. This affordance becomes more valuable over time.

-Workers are machines: the workers are identified only as numbers. I guess it comes with the platforms ability to scale.

+Worker Quantity Scaling: ability to scale is massive

+Limited Requestor Tool Shapeability: requestor's ability to shape the sentiment analysis tool is severely limited

-Workers are Machines, Qualifications still required: without qualifications, requestors don't know who did the work or how their data is skewed by workers


+Ease of Output Files: Get a handy CSV print out

-Additional CSV data Manipulation: CSV requires data manipulation

-Inconsistent WorkFlow: the pay first workflow of Amazon as advertised on the main page is broken. payment comes after the creation of the task.

-Late Estimates: the fund estimate comes after deposit. how do i get my remaining money back?

Resulting CSV for Modified Sentiment Analysis: File:Stanford MechTurk SA.csv

Resulting CSV for Workers of the Sentiment Analysis: File:Stanford MechTurk Workers.csv

Explore alternative crowd-labor markets

Compare and contrast the crowd-labor market you just explored (TaskRabbit/oDesk/GalaxyZoo) to Mechanical Turk.

Turk Odesk (now Upwork)
*small tasks like surveys *large programming efforts
*Acceptance system *connect based system
*workers have access to all jobs except those for which they are unqualified *workers can join a membership program to gain more connects and access to work opportunities
*take it or leave work reward system *work reward system offers a min - x - maximum work reward bidding system
*general crowd *specialist
*qualifications to select *interviews to select, longer effort
*many anonymous workers *potentially continued professional relationships
*short term projects that scale in participation *long term projects that scale by professional skill
*workers are monitored by time *workers are monitored at the laptop
*HITs are able to be manipulated by requesters with off site tasks *controls requestor interactions
*survey takers *programmers
*low value projects *potentially large high value developments
*seems more extrinsically driven *seems more intrinsically driven



  • What do you like about the system / what are its strengths?
    • it is able to scale across mobile contexts quickly
    • users seem to have great task exposure to work
    • it seems to have helped people in poverty by creating an additional revenue stream
    • the scale of the data entry allows for correspondence checking between user answers to the same input item
    • quality rating based from errors
    • value of work is presented in real time
    • minimalist interface and minimal accessibility barriers
  • What do you think can be improved about the system?
    • improvements i think might be limited to keyboard input-screen output tasks
    • the phones that were seen in the article were very low which constrains options
    • interface design might create visual strain for the workers, which might serve responsive design to scale text of images for the screen
    • possibly introduce measures to address repetitive stress injuries
    • the quality rating is reflective of the user interaction with the device, questions of performance of devices might be something to consider beyond user rating systems
    • historical accuracy is also a function of user X device X daylight, data comparing users of same device might help create a device penalty
    • maybe identifying outside factors that influence user scores and providing suggestion to improve work conditions would help the user output and requester results.
    • maybe the system can link tasks to keep worker skill consistent


  • What do you like about the system / what are its strengths?
    • Boomerang rating systems
    • tasks are broken down and shared across workers
    • workers provide feedback to other workers on the same project
    • requestors do not unintentionally sabotage the work quality of workers by using vague work requirements
    • prototype task workflow
    • attempts to create a shared mental model between all parties
    • work feedback options include please revise -- turk is accept/reject
    • task feed ranking
    • seems to address the overstatement of qualifications
    • iteration of task design
  • What do you think can be improved about the system?
    • task feed ranking might include a way for requestors to provide feedback to workers on worthwhile comments
    • can there be a way to address those who understate work qualifications (i.e. build self confidence?)
    • a way to attract task uncommitted but relevant experts over the course of the task design
    • a partial pay system for work portions that are good when the whole job is a little off
    • a marketplace for the marketplace -- provide Daemo originated incentives to illicit feedback participation

Flash Teams

  • What do you like about the system / what are its strengths?
    • able to coordinate project's tasks across experts by linking them
    • taps into highly skilled experts
    • taps into worker elasticity
    • affordances such as structured hand offs
    • directly responsible individuals as compared to workers
    • elastic growth
  • What do you think can be improved about the system?
    • minimize learning of the system
    • assuring saliency of affordances and work quality among similar worker products
    • minimize knowledge distance between workers to kill work in process inventory
    • implementing time buffers or scheduling algorithm or relevant worker allocation algorithms might improve efficiency
    • assure experts share the same knowledge before accepting work to minimize conceptual learning and focus on issues of expert approach to improve work outputs

Milestone Contributors

Slack usernames of all who helped create this wiki page submission: @anotherhuman