Difference between revisions of "Milestone 5 Improving Task Authoring with a Project Manager by Team1"

From crowdresearch
Jump to: navigation, search
Line 6: Line 6:
  
 
By using the PM in our experiment, we show how the quality of delivered tasks is changing analyzing the rejection rate and input data.
 
By using the PM in our experiment, we show how the quality of delivered tasks is changing analyzing the rejection rate and input data.
 
  
 
=== Study method ===
 
=== Study method ===
Line 23: Line 22:
 
*provide information to the requester about implementation process, if needed  
 
*provide information to the requester about implementation process, if needed  
 
*checking quality of the final work (implementation )
 
*checking quality of the final work (implementation )
 
  
 
=== Method specifics and details ===
 
=== Method specifics and details ===
 +
 +
To find out which tasks types are commonly used by requesters, we have observed literature and MTurk [1,2], and found out that the tasks types below are the most common tasks in Mturk:
 +
 +
Transcription, content rewriting, search for question answering online, object classification, website feedback, Image labeling, categorization, image description, binary labeling and math.
 +
 +
Based on the most popular tasks on Mturk, we decided to choose the following task to run our experiment:
 +
Image labeling of Bikes and Motorbikes
 +
 +
[1] Measuring Crowdsourcing Effort with Error-Time Curves
 +
[2] Ipeirotis, P. G. Analyzing the Amazon Mechanical Turk marketplace. XRDS (2010)
 +
  
 
=== Experimental Design for the study ===
 
=== Experimental Design for the study ===
 +
 +
We decide to run a between-objects experiment having an error input as an independent variable and rejection rate as a dependent variable.  We recruit a group of four requesters and four PMs on MTurk. These four PMs are among the best-rated workers on the platform.
 +
In this experiment, we ask the requester group to author a binary survey using the templates. Every task has same level of complexity differentiating with a little range, same template per task and same language. The requester’s way becomes following :
 +
*choosing a profile of work Insets  for his task
 +
*write his task direction / definition  sample
 +
*choosing quantity or workers, that would apply for a task
 +
*choosing time specification
 +
*request processing time fixed
 +
*choosing a PM or group  among suggested from boomerang or other suggested  ranking system
 +
*pending time / request processing / implementation
 +
*receiving a work  from a PM
 +
*work approval payment
 +
 +
The PMs do an additional prototyping of these tasks, therefore doubling the amount of  tasks. After shuffling and randomizing two types of tasks, workers perform the tasks given and the tasks are equally distributed in random order. PMs review the tasks done, decide about the rejection and report the progress to the requesters. so the PM’s way is following:
 +
*writing his task direction/definition in a more detailed way with attached sample materials
 +
*choosing a right worker for a particular task or a few workers for a project
 +
*time specification / request processing
 +
*choosing pattern of work (1/2/3/ times mini deadlines, implementation of  process checking),writing a task feed list  for a worker
 +
*checking with a task feed list worker’s implementation process with detailed explanation if needed
 +
*pending time / request processing / implementation checking
 +
*receiving feed back
 +
*receiving a final work work approval from a  requester
 +
*payment
 +
*ranking a worker 
 +
*ranking a requester
 +
  
 
=== Measures from the study ===
 
=== Measures from the study ===

Revision as of 23:16, 14 February 2016

Task Authorship

Study introduction

Requesters have an influence on the outcomes of workers, so our hypothesis is that bad authorship leads to a lower outcome quality, because of the ambiguity of tasks and its following misunderstanding. In STUDY 1 we propose a one-part experiment, using a crowdsourcing platform for the experimental environment, where we will test Project Manager (PM) and compare the results in the form of Analysis of variance (ANOVA) results before and after.

By using the PM in our experiment, we show how the quality of delivered tasks is changing analyzing the rejection rate and input data.

Study method

Study 1 and all subsequent experiments reported in this paper were conducted using a proprietary microtasking platform that outsources crowd work to workers on the Amazon MechanicalTurk microtask market. We will restrict workers to those who do not have a "Master Qualification” to reach the more general skilled workers. A follow up survey will give us more insights on the demographics of the workers.

We introduce a new concept for crowdsourcing called "PM" (Project Manager), where PM is the person who is hired by requesters to be a supervisor of a group of workers to help create better tasks, finish a required task in a high quality and rating workers. Firstly, requester provides the task that he want to be done and hire workers to perform it, then the requester may choose the best worker from the hired workers to be PM or may hire him individually. After that, the PM's must provide the workers with details and describe them the needed job if the task description provided by requester is not clear enough. During the task performing, PM is monitoring the quality of work and should deliver the task to the requester on time and rate the workers. Here is a list of the responsibilities of the PM:

  • rating of workers
  • task setup (task description, configuration of patterns)
  • managing task fixing (checking delivered work)
  • Providing a template of a requested work task or similar task
  • writing a task feed list for a worker
  • implementation checking with a task feed list
  • provide information to the requester about implementation process, if needed
  • checking quality of the final work (implementation )

Method specifics and details

To find out which tasks types are commonly used by requesters, we have observed literature and MTurk [1,2], and found out that the tasks types below are the most common tasks in Mturk:

Transcription, content rewriting, search for question answering online, object classification, website feedback, Image labeling, categorization, image description, binary labeling and math.

Based on the most popular tasks on Mturk, we decided to choose the following task to run our experiment: Image labeling of Bikes and Motorbikes

[1] Measuring Crowdsourcing Effort with Error-Time Curves [2] Ipeirotis, P. G. Analyzing the Amazon Mechanical Turk marketplace. XRDS (2010)


Experimental Design for the study

We decide to run a between-objects experiment having an error input as an independent variable and rejection rate as a dependent variable. We recruit a group of four requesters and four PMs on MTurk. These four PMs are among the best-rated workers on the platform. In this experiment, we ask the requester group to author a binary survey using the templates. Every task has same level of complexity differentiating with a little range, same template per task and same language. The requester’s way becomes following :

  • choosing a profile of work Insets for his task
  • write his task direction / definition sample
  • choosing quantity or workers, that would apply for a task
  • choosing time specification
  • request processing time fixed
  • choosing a PM or group among suggested from boomerang or other suggested ranking system
  • pending time / request processing / implementation
  • receiving a work from a PM
  • work approval payment

The PMs do an additional prototyping of these tasks, therefore doubling the amount of tasks. After shuffling and randomizing two types of tasks, workers perform the tasks given and the tasks are equally distributed in random order. PMs review the tasks done, decide about the rejection and report the progress to the requesters. so the PM’s way is following:

  • writing his task direction/definition in a more detailed way with attached sample materials
  • choosing a right worker for a particular task or a few workers for a project
  • time specification / request processing
  • choosing pattern of work (1/2/3/ times mini deadlines, implementation of process checking),writing a task feed list for a worker
  • checking with a task feed list worker’s implementation process with detailed explanation if needed
  • pending time / request processing / implementation checking
  • receiving feed back
  • receiving a final work work approval from a requester
  • payment
  • ranking a worker
  • ranking a requester


Measures from the study

What do we want to analyze?

Contributions

@seko - Sekandar Matin, @purynova - Victoria Purynova, @ahmednasser - Ahmed Nasser, @kamila - Kamila Mananova