Winter Milestone 5 vaastav Task Authorship
This page contains the concrete proposal for a study I am suggesting for the Task Authorship for Winter Milestone 5
Study 1: Variance in Requester Quality vs Variance in Worker Quality. We begin by comparing the effect of variance in requester task authorship on the overall result of the task with the effect of variance in worker quality. After describing the experimental setup, designed to generate the data required for such a comparison, we show what kind of effect,if any, does the task authorship have on the overall results of the task and then compare it to the effect of variance in worker quality to see which factor has a stronger effect in the overall results of the task.
Method: Study 1 and all further experiments reported on this paper were carried out using a microtasking platform that outsources crowd work to workers on the MechanicalTurk platform. The workers and requesters were restricted to the United States. The Study was completed with 10 unique requesters and 30 unique workers.
Method Specifics and Details
We began by populating our evaluation tasks with common crowdsourcing task types, or primitives, that appear commonly as microtasks or parts of microtasks. We found that there are 10 primitive types of tasks that were most common to crowdsource workflows (Figure 1).
Experimental Design for the Study
Each requester out of the 10 requesters was asked to author 1 task of each primitive type. So, each requester authored 10 tasks, 1 of each primitive type. Each 1 of the workers were asked to complete all the tasks that were authored. So, each worker completed 100 tasks, 10 each of a primitive type and 10 each of a particular requester. The requesters were then asked to each submission of the worker. The tasks were presented in a ranom order. Workers were compensated $3.00 and repeat participation was disallowed. A single task was presented on each page, allowing us to record how long workers took to submit a response. The details of the requesters and workers were not revealed to remove any kind of personal bias from the judgement criteria. An example task is shown in Figure 2.
Measures from the Study
The data we collected was as follows:
- The tasks presented were objectives so that the responses of the workers were either treated as correct or incorrect for those questions. This gave us the overall percentage of correct answers for each task which in turn gave us the total correct answers for each primitive type and for the tasks of each author.
- From the results, for each worker we also had the number of correct answers and incorrect answers for each primitive type.
- As the time spent on every task was also measured, we calculated the average time for each task of the author and we also calculated the average time of completion for each worker. In addition, we also calculated the average time of completion for each primitive type of task.
What do we want to analyze?
Task Authorship Quality for any kind is defined as a function of the overall success rate of the task and the average time taken to complete the task. Then we could do a simple regression on the average Task Authorship Quality for each requester as the predictor Variable and use the average completion rate of their tasks as the predicted variable and see if therein lie any trends. Similar thing can be doe with the Worker data, with their average success rate of the worker is used as the Predictor and average success rate of the task is used as the Predicted variable.
Then a multiregression could follow taking into account both the worker and requester quality