WinterMilestone 1 CarpeNoctem

From crowdresearch
Jump to: navigation, search

Experience the life of a Worker on Mechanical Turk

MTurk Sandbox Dashboard
My MTurk Sandbox Dashboard
  • So, I decided to work on the MTurk platform as a worker and requestor to understand how things actually work out.

First bummer, MTurk was not available in my country. This poses a disadvantage to the requestors, that they cannot reach out to a certain demographic. In my case, I'm from India, where the population is quite huge, and the country is still under development. So, that means losing a lot of people. Signed up on the Sandbox version, and started figuring out the HIT based system.

  • One of us finally got approved by MTurk two days later so here are some thoughts on the experience of both MTurk and MTurk Sandbox.

Likes

  • A large number of HITs available
  • Tasks were pretty straightforward
  • A decent earning amount for every HIT
  • Any beginner could complete the HITs and make some money in a very small amount of time
  • Approval rate was quite good.
  • Ability to sort HITs on many factors

Dislikes

  • Stringent Sign-Up process
  • Extremely bad visual design. Needs a lot of improvement
  • Low demographic reach, due to unavailability in some countries
  • Format: Some tasks were formatted very poorly
  • Some qualification tests take too long to complete. If there is a incorrect after the test, it might be hard for works to know what goes wrong.
  • Wage is still low considering the total time spent at MTurk for workers.

Suggestions

  • HITs should be categorized and easily filtered by workers
    • For example, Categorization by levels of experience: Beginner, Advanced, ...
    • Categorized by tasks: Data Collection, Survey, Transcription from A/V, ...
  • A interactive tutorial page can be used before qualification test to help workers better understand the process
    • For example, a interactive tutorial that give immediate feedback (correct/incorrect) to workers
  • Tags can be given to the HITs so that workers can identify the tasks quickly. Also, a reward title can be given to the workers after they completed certain amount of similar tasks.
    • For example, "Image Tagger", "Translator", etc.
  • A recommendation system could be built to help workers find their desired tasks faster and easier.
    • For example, "Similar Tasks", "People that completed this task also completed", ...

Experience the life of a Requester on Mechanical Turk

Likes

  • Pre-built templates for making a HIT
  • Edit option after creating a HIT, to modify it a bit
  • Form based procedure, step-by-step complete making your own HIT

Dislikes

  • Again, bad User Interface
  • Forces on some pre-built templates
  • Have to work on some mechanism to ensure work quality
  • Hard to edit the design layout for first time user

Research Engineering (Test Flight)

Daemo linux.png

Readings

MobileWorks

The concept

  • A mobile phone­based crowdsourcing platform intended to provide employment to developing world users.
  • Provides human optical character recognition (OCR) tasks that can be completed by workers on low­end mobile phones through a web browser.
  • To address the limited screen resolution available on low­end phones, MobileWorks divides documents into many small pieces and sends each piece to a different worker.

Reason for developing Mobile Works

  • Amazon’s Mechanical Turk (AMT) has about 200,000 workers who reside primarily in the US and India. About 56% of these Mechanical Turk workers (“Turkers”) are from the United States and 36% from India.
  • Among the Indian population, Turkers are often more educated, earn higher wages, and have a higher standard of living than the average Indian. Yet many Indians have limited English literacy and lack access to a desktop computer

Model

  • Use the historical accuracy of the worker to model the future payment of the tasks she is assigned
  • A priori payment is a function of the task and her quality of the work.

Results from the Model

  • Participants were able to complete 120 tasks per hour, which includes typing the text version of 19th century newspaper stock data and scanned documents
  • Overall accuracy of the workers, without considering multiple entry error detection, was about 89%.
    • Dual entry accuracy: 98.79%
    • Triple entry accuracy: 99.89%

Other platforms that people created similar to Mobile Works

TxtEagle
  • Deployed in Kenya
  • Used SMS text messages to provide tasks like audio transcription, loc­al language translation and market research
  • Disadvantage:
    • Because they rely on text SMS as the medium of communication and are restricted in length, there are a limited number of tasks that can be accomplished through the use of simple text messages.
SamaSource
  • Does not use mobile crowdsourcing;
  • Establish outsourcing centers in developing regions where workers are actively managed

Deliverables

  • What do you like about the system / what are its strengths?
    • Ability to utilize the workforce in less developed countries
      • According to the research, while Amazon’s Mechanical Turk consists of 56% workers from United States and 36% workers from India. We can possibly utilized and expand the workforce in less developed countries for these tasks
    • Flexibility of Mobile
      • Many Indians lack access to a desktop computer. Having a platform on mobile allows them to fulfill the tasks anywhere, anytime, expanding the market of the workforce.
      • Such mobile platforms can expand the type of tasks beyond just SMS based tasks that other platforms used, such as TxtEagle
    • Accuracy
    • The overall accuracy of the workers is very acceptable. Without considering multiple entry error detection, the accuracy was about 89%. Dual entry accuracy was 98.79% and triple entry 99.89%. Using multiple entries, the platform offered from Mobile Works has a very satisfying accuracy level compared with desktop platforms.
  • What do you think can be improved about the system?
    • Expanding the type of tasks
      • Apart from image recognition, there are also other types of low­level tasks: such as those that require users to visit certain websites to gain information or fill out surveys. Mobile Works can consider building platforms that support these tasks. For example, for tasks that require users to go to specific websites, Mobile Works can preload the text from those websites so that users do not have to press the URLs directly.

Daemo

Problem behind traditional crowdsourcing platforms

  • Workers and requesters are often unable to trust each others’ quality, and their mental models of tasks are misaligned.
  • Reasons include:
    • Flawed reputation systems which do not accurately reflect worker and requester quality
    • Poorly designed tasks.

Daemo at a glance

  • Boomerang (a reputation system)
    • A reputation system that incentivizes alignment between opinion and ratings by increasing the likelihood that the rater will work in the future with any users they rate highly.
    • Worker ratings determine the ranking of the list of tasks available to them.
  • Prototype tasks
    • Require that all new tasks go through a prototype feedback iteration phase with a small number of workers so that requesters can revise their instructions and task design before launch
    • Prototype tasks pay 3–5 workers to complete a small percentage of the overall work and provide feedback on how to improve the task interface or clarify it.
  • Incentive ­compatible designs
    • Incentivize workers and requesters to act in ways that produce the behavior that the system designer desires.
  • Other features
    • Recent ratings are weighted more heavily than older ratings, allowing for improvement to manifest.
    • All new members of the platform start with a good average score that is the equivalent of several good ratings.
  • The Research aims to:
    • The first two studies measured the impact of the Boomerang reputation system on
      • Requesters’ rating behavior and
      • Workers’ rating behavior.
    • The final study measured
      • The impact of prototype tasks on task quality.

Deliverables

  • What do you like about the system / what are its strengths?
    • Ability to share responsibility on the quality of tasks
      • For traditional crowdsourcing platforms, the worker bears most of the responsibility in the quality of the tasks performed. However, in many cases, the tasks might not have been well­design or clear for the workers. By sharing the responsibility using prototype tasks, this system is a lot fairer.
    • Aligns incentives between worker and requester
      • The boomerang mechanism ensures that workers who have been rated higher by a requester have a higher chance in working with the same requester in the future. This reduces issues with rating inflation and provides incentives for requesters to fairly rate workers.
    • Rating systems that are more aligned to the actual performance of worker
    • Increased task quality
      • With better alignment­incentive systems and the existence of prototype tasks, the tasks are better designed and workers have greater incentives to perform the tasks better.
  • What do you think can be improved about the system?
    • May not have enough incentives for workers and requesters to rate
      • Can suggest requesters to rate at least 10% of the workers
    • Not every user will be incentivized to rate carefully with Boomerang.
      • Requesters may feel that rating future workers increases their workload. Those who only have one task to do may care a lot less about rating.
    • Feedback mechanism
      • The current Daemo feedbank system is static. There can be further experiments with synchronous chat so that requesters can have continuous iterations and this can speed up the prototype design process.
    • Additional operational costs for the requester
      • When requesters are required to iterate, there may be additional time cost as well as the monetary cost from the premium given to prototype feedback workers.
      • However, these costs may be traded off with better quality of tasks and increased payment for prototype tasks to reduce wait and iteration time.
    • Workers may not be incentivized to do prototype tasks
      • Workers often only look for small tasks for larger quantity returns. We can consider paying even more for prototype tasks or offer other privileges to workers who complete prototype tasks: eg. faster access to certain tasks.

Flash Teams

Key Concepts

  • A vision of expert crowd work that accomplishes complex, interdependent goals such as engineering and design.
  • Consist of sequences of linked modular tasks (Lego “blocks” of tasks) and handoffs that can be computationally managed.
  • Characteristics:
    • Flash teams can be recombined to form larger organizations and authored automatically in response to a user’s request.
    • Can also hire more people elastically in reaction to task needs, and pipeline intermediate output to accelerate completion times.
    • Work can be pipelined: when in­progress results are enough for downstream tasks to begin work, the system passes in­progress results along to accelerate completion time.
  • Tasks that can be managed:
    • Design prototyping, course development, and film animation, in half the work time of traditional self­managed teams.
  • Tasks that need flash teams.
    • Such tasks require deep domain knowledge that is difficult to decompose into independent micro tasks anyone can complete.
  • An additional DRI (directly responsible individual) is recruited for each task.
  • Workers were typically paid $25–$30 per hour, and costs ranged from roughly $750 to $1270 for up to five participants.
  • Example: from Napkin Sketch Design to final web app
    • The blocks:
      • Low­fidelity prototype
      • Heuristic evaluation
      • Revised low­fidelity prototype
      • Software prototype
      • User study
      • Revised software prototy
  • Foundry (name of platform the research group created that uses flash teams)
    • An end­user authoring platform and runtime manager.
    • Allows users to author modular tasks, then manages teams through handoffs of intermediate work.

Deliverables

  • What do you like about the system / what are its strengths?
    • Ability to handle complex tasks
      • The flash teams system allows requesters to request complex tasks that require expert level or deep domain knowledge that spans across different areas. For other platforms available, the system makes it difficult to decompose them into independent micro tasks that anyone can complete.
    • Saves half the time compared with simply assigning teams
      • According to the research, the flash team took approximately half as many work hours as the traditional team. By calculating active work time, even the slowest flash team completed the task faster than the fastest team in the control traditional condition. On average, the control teams spent 2.4x the hours on Design, 1.9x the hours on UX Research and 1.4x the hours on Development. This results in an extra 10hr 44min in cumulative work.
    • Better coordination and work experience
      • According to the results of the research, the teams under the flash structure followed the design process more closely and required less coordination.
    • Better at emergency handling
      • When workers from both types of teams disappeared due to personal reasons or other commitments, the flash teams were quick in making rearrangements and could reach out to the crowd for replacement quickly. For the traditional control team, they usually grew too frustrated with the experience and many other members left because of this reason.
    • Can take advantage of timezone differences
      • Flash teams can make use of differences in timezone that may allow them to carry on tasks uninterruptedly for days or even weeks.
  • What do you think can be improved about the system?
    • Less of a camaraderie experience
      • According to the research, certain flash team members said that they wanted a better experience in building camaraderie — which may be more difficult to achieve in flash teams as tasks were divided and some members would just finish the task and leave.
      • Solution:
        • The Flash team system can consider making slightly longer blocks or group several blocks into larger “processes” or categories so that members will feel that they are more welcomed or obliged to engage / provide feedback in processes with several building blocks even when they might not have the expertise that certain blocks of a process.
    • Can consider providing review mechanisms so that members who previously worked with each other can rate their members to optimize team formation
      • The flash system can consider asking members to rate their other members so that when a new task is formed, the system could arrange those people who liked working with each other together in the same team, and those who did lot like the experience of working with each other in separate teams. This is because different people may have different personality traits even if they have the same expertise and different personality traits may work better with each other. More organizational behavior research can be done in this area.

Milestone Contributors

  • Michelle Chan : @michellechan
  • Manoj Pandey : @manojpandey
  • Mengnan Wang : @mengnan