Milestone 5 Improving Task Authoring with a Project Manager by Team1

From crowdresearch
Jump to: navigation, search

Task Authorship

Study introduction

Requesters have an influence on the outcomes of workers, so our hypothesis is that bad authorship leads to a lower outcome quality, because of the ambiguity of tasks and its following misunderstanding. In STUDY 1 we propose a one-part experiment, using a crowdsourcing platform for the experimental environment, where we will test Project Manager (PM) and compare the results in the form of Analysis of variance (ANOVA) results before and after.

Taskauthoring-project-manager.png

By using the PM in our experiment, we show how the quality of delivered tasks is changing analyzing the rejection rate and input data.

Study method

Study 1 and all subsequent experiments reported in this paper were conducted using a proprietary microtasking platform that outsources crowd work to workers on the Amazon MechanicalTurk microtask market. We will restrict workers to those who do not have a "Master Qualification” to reach the more general skilled workers. A follow up survey will give us more insights on the demographics of the workers.

Project-flow.png

We introduce a new concept for crowdsourcing called "PM" (Project Manager), where PM is the person who is hired by requesters to be a supervisor of a group of workers to help create better tasks, finish a required task in a high quality and rating workers. Firstly, requester provides the task that he want to be done and hire workers to perform it, then the requester may choose the best worker from the hired workers to be PM or may hire him individually. After that, the PM's must provide the workers with details and describe them the needed job if the task description provided by requester is not clear enough. During the task performing, PM is monitoring the quality of work and should deliver the task to the requester on time and rate the workers. Here is a list of the responsibilities of the PM:


  • rating of workers
  • task setup (task description, configuration of patterns)
  • managing task fixing (checking delivered work)
  • Providing a template of a requested work task or similar task
  • writing a task feed list for a worker
  • implementation checking with a task feed list
  • provide information to the requester about implementation process, if needed
  • checking quality of the final work (implementation )

Method specifics and details

To find out which tasks types are commonly used by requesters, we have observed literature and MTurk [1,2], and found out that the tasks types below are the most common tasks in Mturk:

Transcription, content rewriting, search for question answering online, object classification, website feedback, Image labeling, categorization, image description, binary labeling and math.

Based on the most popular tasks on Mturk, we decided to choose the following task to run our experiment: Image labeling of Bikes and Motorbikes

Experimental Design for the study

We decide to run a between-objects experiment having an error input as an independent variable and rejection rate as a dependent variable. We recruit a group of four requesters and four PMs on MTurk. These four PMs are among the best-rated workers on the platform. In this experiment, we ask the requester group to author a binary survey using the templates. Every task has same level of complexity differentiating with a little range, same template per task and same language. The requester’s way becomes following :

  • choosing a profile of work Insets for his task
  • write his task direction / definition sample
  • choosing quantity or workers, that would apply for a task
  • choosing time specification
  • request processing time fixed
  • choosing a PM or group among suggested from boomerang or other suggested ranking system
  • pending time / request processing / implementation
  • receiving a work from a PM
  • work approval payment

The PMs do an additional prototyping of these tasks, therefore doubling the amount of tasks. After shuffling and randomizing two types of tasks, workers perform the tasks given and the tasks are equally distributed in random order. PMs review the tasks done, decide about the rejection and report the progress to the requesters. so the PM’s way is following:

  • writing his task direction/definition in a more detailed way with attached sample materials
  • choosing a right worker for a particular task or a few workers for a project
  • time specification / request processing
  • choosing pattern of work (1/2/3/ times mini deadlines, implementation of process checking),writing a task feed list for a worker
  • checking with a task feed list worker’s implementation process with detailed explanation if needed
  • pending time / request processing / implementation checking
  • receiving feed back
  • receiving a final work work approval from a requester
  • payment
  • ranking a worker
  • ranking a requester

Measures from the study

We have logged all task performance data from all workers to measure the rejection rate and correct and wrong answers. Each task which had a relation to a Project Manager is segmented as PM in the data table and all tasks without a relation to a Project Manager are segmented as No-PM. We have found out that 30% of all wrong answers of tasks which a PM had carried out were not rejected. The reason for that was that the PMs communicated with the workers and gave them proposals and guidance. The total rejection rate of wrong answers of tasks without a PM was nearly 50% in comparison to that the rejection rate of tasks with wrong answers and a PM were only 28%. A survey to measure the demographics and feedback of the same workers who participated in our experiment showed that they liked how the PMs delivered the project description and guidance through the project.

The statistical significance is evaluated by ANOVA and a t-test to measure the difference in outcomes.


Anova.png

Where, SSE = Sum of squares due to error

S = Standard deviation of the samples

N = Total number of observations.

What do we want to analyze?

In this study 4 requesters, 4 Project Managers and 35 workers participated to perform 288 tasks. We averaged our dependent measures across all 35 workers and compared the average wrong and rejected HITs of tasks where a Project Manager was involved with those where a PM was not involved. We wanted to analyse the influence of requesters and intermediaries on workers outcome.

References

[1] Measuring Crowdsourcing Effort with Error-Time Curves [2] Ipeirotis, P. G. Analyzing the Amazon Mechanical Turk marketplace. XRDS (2010)


Contributions

@seko - Sekandar Matin, @purynova - Victoria Purynova, @ahmednasser - Ahmed Nasser, @kamila - Kamila Mananova