Milestone 2 TestSet

From crowdresearch
Jump to: navigation, search

This is a submission from team TestSet using the Milestone 2 template given.

Attend a Panel to Hear from Workers and Requesters

Deliverable Panel 1 and Panel 2

Report on some of the observations you gathered during the panel.

  • Requesters tend to perform better with experience on tasks. Some requesters believe that getting the optimal result occurs when task completion times are around 10 minutes.
  • Requesters tend to perform better when they have understood the system and can thus with experience understand more about the demographics, psychology or workers and so on.
  • Requesters cannot completely rely on MTurk, some tasks may not be completed and/or it will be difficult to validate responses. It is also difficult to judge sometimes if a response was honest.
  • Requesters try to use MTurk especially for studies which require demographic variations, large number of responses and quick access to results.
  • It is a challenge for requesters to correctly frame tasks, set milestones, and have very clear task descriptions to get most effective results.
  • Some requesters would want to be able to select a certain group of workers based on previous experience, or based on performance on certain tasks.
  • Workers would want some certainty about jobs, tasks, payment amount and payment time.
  • Workers should be able to have some say regarding tasks , instructions, or when their responses may/may not be rejected.
  • Workers would prefer better ways of searching tasks based on task quality, wage and who the requester is
  • Workers and Requesters should be able to communicate with each other for better task efficiency.

Reading Others' Insights

Worker perspective: Being a Turker

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

  • Many workers tend to just Turk part time and see it as a way to make a quick buck, and yet, there is no system to actually make it accessible in that form as well. For example, an ideal point to work would be when you're coming home from your day job, you look at your phone and decide to make some money by doing AMT, but there isn't an app for it! Just having this would increase responses considerably and allow for more acceptance by both requesters and Turkers.
  • Needs to be better sorting systems are essential. Part of the problem may be insufficient information being collected from the Requesters, with more information better sorting would be possible and this is crucial for both the Turkers and the Requesters since this will ensure the people who want to do specific kinds of work find that work.
  • Needs to be better legal recourse. A lot of what these systems do are not well managed legally, and therein lies an issue. The entire system needs a more cohesive legal backing so that both requesters and the Turkers can depend on legal recourse. In the words of one Turker, "its like the Wild West" - AMT needs to avoid duelling as a solution to problems between Turkers and the requesters.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

  • There are many requests for surveys that just want to collect data from a large pool, in order for a proof of concept, but this allows people to easily game the system
  • There is definitely a mismatch between the task levels and the amount of money offered - this is suggested definitively through the ideas of prospect theory, there needs to be a way to manage this better. There should be software based checkpoints that ensure that this distribution is maintained well

Worker perspective: Turkopticon

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

  • Workers have varied reasons about working, but mostly people do it for money. Some have other reasons like boredom and additional money and some just like the concept.
  • Workers praise requesters when the HITS are approved and disapprove of them when HITS are rejected. This is a very serious concern since sometimes dispute resolution is hard. Requesters do not necessarily give explanations as to why certain work was rejected.
  • Since wage per worker is really low one of the key problems is that workers are not treated with a lot of respect. This is a concern especially for certain groups of people who rely on turking as a source of income.
  • Far more workers than requesters makes the necessity of respect for workers challenging to implement and workers are at the mercy of the requesters with respect to acceptance/rejection of HITS and late payment.
  • Some workers just do it for the money which can reduce the quality of the task performance, that is, workers would want to get as many tasks done as possible within a limited time.
  • Workers have to sometimes suffer due to the fact that if they refuse tasks with low wages, people in developing countries will take over the task. Thus they have no say in the amount of money they are getting per task.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

  • Requesters find it difficult sometimes to explain to workers why certain work has been rejected. Thus outrage of workers in forums like reddit or other turking forums can lead to bad reputation of requesters over time, especially those who rely on crowdsourcing often.
  • Requesters have too much power considering there is an over surplus of workers.
  • Requesters can sometimes not respond to individual worker concerns, pay late, or even pay less in case they are not satisfied with the workers.
  • Requesters have the unfair advantage of reducing wages massively for developing countries where per dollar value is higher.

Requester perspective: Crowdsourcing User Studies with Mechanical Turk

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

  • Workers come from varied experiences and varied ages. This helps in reaching a large demography and various ages.
  • Workers can sometimes get used certain types of studies. Especially in social sciences they may be quite familiar with the responses and this can then lead to skewed results since these workers may have participated in such studies before. This is the same problem that exists when graduate students are the only population sampled from for experiments.
  • Workers try to accomplish as many micro tasks as possible in a short amount of time. Some believe that even if some tasks get rejected they will still be able to earn thanks to the sheer volume of the tasks they perform.
  • Worker experience and approval rate only matter and that necessarily does not reflect the best workers. Lots of workers have found various ways to game the system.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

  • Requesters can misuse this readily available human intelligence by not paying enough.
  • Requesters may have to deal with workers who have learnt to game the system and this aspect is very difficult to address.
  • Requesters can get a huge amount of responses very fast since this is a readily available amount of worker set available.
  • Requesters have this advantage of reaching a huge population with very little monetary expenditure

Requester perspective: The Need for Standardization in Crowdsourcing

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

  • Workers cannot really expect to find tasks based on their intelligence level.
  • Workers cannot determine ever how much money a task is worth and thus their is no strong incentive to the quality of the task except the very little monetary gain.
  • Workers are often very low skilled and most workers would probably perform a task better in case the level of education was higher.
  • Workers have no way of finding which requesters are the best to work for directly and some requesters may have bad reputation on the forums due to previous HIT acceptance/ rejection results.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

  • Requesters only understand how to conduct effective crowd studies with experience.
  • Requesters do not have the option to chose a certain section of workers whom they want this task to be done based on education level or advance measurements.
  • Requesters have no methodical way to judge the amount of monetary reward for a task.
  • Requesters can get a huge number of responses and may need to individually validate the responses.

Both perspectives: A Plea to Amazon: Fix Mechanical Turk

1) What observations about workers can you draw from the readings? Include any that may be are strongly implied but not explicit.

Workers:

  • Workers need to be respected thus having worker ratings and creating a reputation system for workers was the idea proposed.
  • There should be tasks that judge the workers ability, intelligence level and these measurements should also be used to judge a worker.
  • Worker rating by a requester can ensure better quality of workers.
  • Have all these accessible via APIs so that a set of workers can easily be accessed.

2) What observations about requesters can you draw from the readings? Include any that may be are strongly implied but not explicit.

  • Requesters should be easily trustable. From previous tasks, or time of payment for last task or other forms of reputation mechanisms.
  • Rejection rate of the requesters, volume of work posted etc should be visible for each requesters. Gives some power to workers to chose who to work for.
  • Requesters should not have the ability to reject work that is not spam. Some workers may have actually worked seriously and requesters should not have the unfair advantage.
  • Have all these accessible via APIs so that a set of requesters can easily be accessed.

Do Needfinding by Browsing MTurk-related forums, blogs, Reddit, etc

List out the observations you made while doing your fieldwork. Links to examples (posts / threads) would be extremely helpful.

  • Better sorting of tasks (the very existence of this page suggests it's important: r/HITSWorthTurkingFor)
  • (Task difficulty + Task Worthwhileness) / Money offered = Constant 'k': Feedback about this from the Turkers would ensure that this is maintained without Turkers trying to just game the system.
  • An important need for both the requester and the Turker is that there should be a third-party review system. At its core, science itself is a crowdsourced engine, and a case study of science in light of that can shed a lot of useful ideas for actual crowdsourcing engines.
  • Long term workers could possibly be the third party reviewers that decide whether submissions are good/bad. This would fit in well with their intrinsic motivation to move "up" through the system of AMT. A lot of forums follow a similar process (many threads that suggest there are incorrect rejections or bad work from workers suggests this is important).
  • There are a lot of monotonous, dry tasks, many of which contain the same questions as before, a way around this is to create monetized data sets (Though a way to control how much to charge for the data set is important - maybe some percentage of (how much was paid to workers) * (how many Turkers worked on it))
  • Both money and time count together towards enhancing your reputation. Ensures people can't game the system by not taking money and enhancing reputations easily
  • Kaggle, a data science company that tried to crowdsource big data problems ended up getting a lot of solutions that didn't generalize. This is a crucial problem for tasks involving tough skill sets, the solutions they get might not generalize to their problems well. There needs to be a way to manage this

Synthesize the Needs You Found

Worker Needs

A set of bullet points summarizing the needs of workers.

  • Example: Workers need to be respected by their employers.
    • Evidence: Sanjay said in the worker panel that he wrote an angry email to a requester who mass-rejected his work.
    • Interpretation: this wasn't actually about the money; it was about the disregard for Sanjay's work ethic.

Requester Needs

A set of bullet points summarizing the needs of requesters.

  • Example: requesters need to trust the results they get from workers.
    • Evidence: In this thread on Reddit (linked), a requester is struggling to know which results to use and which ones to reject or re-post for more data.
    • Interpretation: it's actually quite difficult for requesters to know whether 1) a worker tried hard but the question was unclear or very difficult or an edge case, or 2) a worker wasn't really putting in a best effort.