Milestone 2 teamtrojan

From crowdresearch
Revision as of 20:07, 11 March 2015 by Rashmiputtur (Talk | contribs) (Crowdsourcing User Studies with Mechanical Turk)

Jump to: navigation, search

Attend a Panel to Hear from Workers and Requesters

It was a great experience to attend the hangouts session. The hangouts helped in understanding the perspectives of workers and requesters better.

Some observations made during the meeting :

From the requester's perspective:

1. It is important to post tasks that have the potential to produce more responses.

2. Try to post new category of tasks.

3. The obtained responses can be useful in social science research.

4. Requesters may leverage demographic variety.

5. They face problems related to validating responses.

6. Experience of the requester plays a vital role in describing a new task.

7. It is challenging for requesters to transfer tasks to micro-tasks or tasks appropriate to crowd sourcing platforms.

8. Importance must be given to the language used and description of the tasks.

9. Some panelists felt that it was appropriate to allow workers post questions regarding the task description thereby allowing them to frame instructions.

10. It is important for requesters to “Get the right people to do the right tasks at the right time”. Also important to allow workers to submit successive responses.

11. Requesters must discern if participants are paying attention or are honest in submitting their responses.

12. Make tasks granular and pay workers via milestones achieved.

13. Many panelists felt that workers' responses must not be rejected.

14. May provide other incentives apart from monetary incentives. Many felt that posting tasks without pay would defeat the purpose of crowd sourcing.

15. It is difficult to rate a worker based on acceptance percentage since they do not know how the percentage adds up.

16. When requesters do not receive enough responses ,there are various options to promote the tasks including : Determine if task is appropriately priced and check if instructions are absurd.

From the worker's perspective:

1. An important factor in finding reasonable tasks is time of the day. Many panelists were of the opinion that early mornings or late nights were suitable times. Also, it was observed that weekends have less tasks than weekdays, but more workers . From the worker's perspective, “It is important to be there when tasks or HITs are posted”.

2. Communicating with the requester or asking questions can help find better and suitable tasks.

3. Common filters used to scan through tasks are: Time, Monetary reward, Interests and Task ethics.

4. Most workers may perform hourly tasks.

5. May think on the correlation between length of time spent on a task and quality of response.

6. Cold start is a problem in few crowd sourcing platforms, oDesk for example. There must be some sort of test to allow workers showcase their skills.

7. It is important to convert practical experience to digital skill.

8. Concerns are raised on whether the amount of money is highly variable or static.Panelists were of the unanimous opinion that the amount of money is highly variable.

9. Workers would appreciate if it is easier to find work that suits their interests.

10. Informative tags on tasks and categorization of tasks help workers find appropriate work faster.

Reading Others' Insights

Worker perspective: Being a Turker

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.


From the worker's perspective :

A. Amazon Mechanical Turk

1. Limited/ No Authority: In Amazon Mechanical Turk(AMT), workers have limited options for dissent in case their work gets rejected. They have no legal recourse against employers who reject their work and then go to reuse it.

2. Wage Theft: In AMT, workers are sometimes victims of wage theft when employers deliberately reject their work.

3. Interchangeable Treatment: Due to surplus labor, workers are treated interchangeably i.e no efforts are made to stop a worker from leaving the platform.

4. Difficulty in gaining approval rating: When a worker’s task is rejected, his/her approval rating falls down and then the system high priority tasks from the worker. Thus, hindering his/her chances of improving the approval rating.

5. Communication Issues: Amazon system as well as requesters frequently do not reply back to the workers’ concerns.

6. No Minimum Wage Policy: There is no fixed minimum wage per hit or per hour for the workers making their income highly variable.

7. Reason for working: The purpose of working on this platform varies for each worker from fun/ pass time to paying electricity bills.

B. Turkopticon

1. Workers’ mutual aid: Workers enter reviews, ratings and comments for the requesters which helps them in avoiding the spammers.

2. Search Functionality: Workers have access to search the tasks using keywords like requester’s name or id.

3. Anonymity: Workers are protected from retribution by obfuscating their email addresses while posting reviews and comments for requesters.

From the requester's perspective :

A. Amazon Mechanical Turk

1. Flexibility: Requesters have flexibility of accepting or rejecting the work submitted by the worker.

2. Dealing with Communication Issues: The cost and effort of justifying each task rejected by the requester is more than the cost invested for the HITs.

B. Turkopticon

1. Avoid Spam Requesters: Review system makes it difficult for spam requesters to fool the workers.

Crowdsourcing User Studies with Mechanical Turk

From the requester's perspective :

Requesters may view Mechanical Turk as an alternative low cost and time mechanism to obtain user inputs on extensive or intensive tasks. They may aim to access a wide pool of users and obtain user input on a significantly large scale. Also aim to obtain the benefits of the geographic diversity of participants.

Two experiments were conducted to determine the usability of Mechanical Turk in user studies. The empirical study asked users to rate a set of fourteen Wikipedia articles in an effort to match user ratings with Wikipedia administrator ratings. The articles were chosen from a random pool with a range of expert-ratings.

Experiment 1 : Users were asked to evaluate an article on a scale of 7 based on factors like factual correctness, structure, neutrality and overall quality for a reward of $0.05. They could also provide optional feedback regarding improvements to the article through a text box. The optional feedback aimed at determining the veracity of ratings provided by users.

Experiment 2 : This experiment was built upon the previous experiment to try reduce the number of invalid responses and malicious use of the system. The experiment aided users provide better subjective responses by including a verifiable questionnaire.

The above experiments produce important insights about requesters:

1. Experiment 1 led to the following observation : In order to avail the benefits of Mechanical Turk, the formulation of tasks needs special attention, failing which, results may fail the task's purpose.

2. The increasing subjective nature makes it difficult for the requester to validate answers as observed in experiment 1.

3. Ignorance of participants experience, difficulty in approaching them and limited demographic information raise concerns over accuracy and correctness of responses.

4. Results of experiment 1 were not in favor of using crowdsourcing platforms for research purposes. This is an important factor for a requester.

5. The different responses of the two experiments reflect on the importance of design considerations for researchers to avail the advantages of these platforms.

From the worker's perspective :

1. As seen from the results of experiment 1 , response times of worker's vary drastically. When users are not required to provide verifiable answers, they may submit insignificant responses.

2. Lack of objective tasks allow workers to game the system.

3. Workers' responses to the same task vary depending on the task description and on the subjective nature of tasks.

The Need for Standardization in Crowdsourcing

From the worker's perspective :

1. Adaptation Problems: Since workers are not provided any kind of training in the current market scenario, they need to work hard to learn the intricacies of interface environment for each employer and adapt to requirements for each employer

2. Variable Income: The amount of money made by a worker by working equal number of hours on two different days is highly variable. Thus, this source of income is highly variable.

3. Insecurity: Because of no proper training and no skillset specification for a worker, a worker is always skeptical about the likely outcome i.e whether his work will be accepted by the requester.

4. Skills mismatch: Workers face problem in finding out the job they need and this often leads to mismatch of skills/ talent.

5. Flexibility: Workers have freedom of choosing the kind of work they wish to perform from all the given available tasks. They have the ease of experimenting with different types of tasks.

From the requester's perspective :

1. Determining Pay: Requesters face problems in determining the appropriate monetary incentive of a task. Sometimes requesters pay different prices for similar tasks.

2. Predicting Completion times: Requesters are, in many cases, unable to determine the correct estimated time for a job.

3. Difficulty in getting HITS: There is no provision for requesters to mark a task as a high priority task. Thus, in case a high priority task does not receive required number of HITS in a given time frame, they are forced to modify the price and repost the task.

4. Since there is no standardization, Employers/ Requesters learn the best practices(eg: user interface) by experience.

5. Determining workers’ capabilities: In the current market scenario, it is difficult for requesters to decide if a particular worker is capable of completing the task or not.

Both perspectives: A Plea to Amazon: Fix Mechanical Turk

From the requester's perspective :

1. Need a better interface : Mturk fails to reduce overhead, friction, transaction costs, and search costs. The interface must me made easy for requesters to post HITs.

2. Monetary loss : The requester may have to hire a full time developer to deal with the platform's complexities in order to access the workforce for micro-tasks.

3. Requesters may fail to allot the required time and money due to the following reasons ; Build a quality assurance system from scratch, ensure proper allocation of qualifications, break tasks appropriately into the workspace, stratify workers according to quality etc.

4. The above barriers make it difficult for small requesters to grow.

5. Requesters find it difficult to post tasks using the existing user interface and lack of necessary tools.

6. Requesters may find it difficult to build their own interfaces and own workflow systems from scratch.

7. The ease with “Number of completed HITs ” and “Approval rate” can be gamed raise concerns over the system's vulnerability to such issues.

8. Requesters need a mechanism to differentiate Good Workers” from “Bad Workers”. Failure to distinguish them may lead requesters to assume every worker as bad. Hence, good workers may end up getting the same pay as the bad workers. As a result, good workers may leave the platform.

9. On certain occasions, the task may have to be repeated multiple times to ensure quality.

10. It is difficult for new requesters to obtain significant HITs . A large set of workers willing to work on tasks of new and unproven requesters will be spammers and inexperienced workers. This may lead to low quality results and disappointment of the requester.

From the worker's perspective :

1. Workers need Mturk to guarantee the trustworthiness of the requesters. Just having a subjective reputation system for requesters does not suffice.

2. Workers may not complete HITs of new requesters unless guaranteed that the requester is legitimate, pays promptly and does not reject unfair work.

3. Workers expect to see a set of objective characteristics of requesters to decide whether a particular posted task must be chosen.

4. Workers should be shown, how fast the requester releases the payment, frequency of reporting worker's reports as spam and appeal rate of requesters.

5. Workers must have the right to appeal against a requester's rejection. This enables worker's to work with new requesters.

6. It is important for workers to consider the volume of posted work and lifetime of requesters in the market. These characteristics help workers decide if it is right to invest time in completing tasks of requesters.

7. Finding relevant tasks must be made easy. The user interface must be worker friendly. It should enable workers to navigate and browse through available tasks.

Do Needfinding by Browsing MTurk-related forums, blogs, Reddit, etc


Task Finding: Workers need a way to easily find a good task that fits into their requirements.


Qualification System: Workers need to reduce the number of rejected submissions/ HITs. A better qualification system needs to be implemented that lets the requesters control the quality of the work based on their own standards and also prevent Wage Theft.


Approval Rate: Workers need to increase their approval rate in order to view high rated tasks and earn more money. Quoting a worker from a Reddit post: “I'm at 966 submitted and have an approved rate of 93.6% I was at about 97% about a week ago but I fucked up and got 40 rejections. really sucks because I was averaging about 10$ a day and I am now struggling to barely make 5$ a day. Any hits out there that I can bang out with my stats and try to get back to normal? ” (link)

Communication Platform: Workers need a communication platform to interact with the requesters in an efficient way. This may be helpful in understanding their requirements and thus, better chances of task acceptance.

Support Team: Workers need a support team that actually investigates and acts on flagged/reported requesters in a reliable and timely manner.


Modifying Post: Requesters need a way to modify the post(pay , time duration ) after a task has been posted in order to increase chances of getting HITs without having to repost.

It would be useful to have a demographic page that could simply be incorporated into any task, so we don't have to be constantly filling in the same old demographic page each time: that wastes the time of both the requester -- to have to ask for it -- and the worker -- to have to do it. Furthermore, this completely limits gaming the system: you don't have workers changing their age, race, sex, education and/or number of children in order to fit what they think the requester is looking for.

Learning User Interface: Requesters need standardization in order to easily learn the user interface to attract more workers to complete their tasks.(Link to the associated post)

Deciding Pay per Task : Requesters need a standardized way to decide what must be paid to the workers for a particular task and how much time should be allotted.(Link to the associated post)

Priority Tasks : Requesters need a way to promote priority tasks among the workers.


Synthesize the Needs You Found

List out your most salient and interesting needs for workers, and for requesters. Please back up each one with evidence: at least one observation, and ideally an interpretation as well.

Worker Needs

A set of bullet points summarizing the needs of workers.

  • Example: Workers need to be respected by their employers. Evidence: Sanjay said in the worker panel that he wrote an angry email to a requester who mass-rejected his work. Interpretation: this wasn't actually about the money; it was about the disregard for Sanjay's work ethic.

Requester Needs

A set of bullet points summarizing the needs of requesters.

  • Example: requesters need to trust the results they get from workers. Evidence: In this thread on Reddit (linked), a requester is struggling to know which results to use and which ones to reject or re-post for more data. Interpretation: it's actually quite difficult for requesters to know whether 1) a worker tried hard but the question was unclear or very difficult or an edge case, or 2) a worker wasn't really putting in a best effort.