Milestone 4 Analysing Failure with "FeedbackMe" system by Team1

From crowdresearch
Jump to: navigation, search

Introduction

Analysing Failure Idea: a "FeedbackMe" system

Сrowdsourcing entails the process of obtaining needed services, ideas, or content from a large group of people. A wide variety of tasks can be crowdsourced. But in any crowd, people have different skills and different backgrounds, which means that some workers may be better than others at some tasks, and worse than others at other tasks. Tasks can vary from easy ones, those with binary questions (simple analysis of images, i.e. questions like “Is there a human face on the image?”), to complex ones such as translation or any other task where the output is intricate. Due to the lack of proper communication, it is impossible to predict what problems will be met during the creation and execution of a task. Language barriers, ambiguity of questions and the quantity of work-time relationship can cause huge risk of task failures. How to improve the quality? Studies have shown that the money remuneration doesn't have its value after some time. But it was shown that the quality of task results can be substantially improved by applying spam fillters, gathering data from some set of workers, choosing an adequate task design (Huang et al., 2010). Both requesters and workers are interested in receiving good quality work, therefore they are interested to receive feedback to improve their work and to lower the risk of task failures.

The design of a system

This introduction presents a design and evaluation of «FeedbackMe», a software feature designed to analyze task failure and find its causes. For a moment, most of the platforms do not provide any feedback. Worker still can do a self-evaluation, but often, when giving a feedback, we meet some difficulty in scaling the measurable and unmeasurable qualities, and observable and unobservable qualities. Therefore, we present the following design of a system.

The system has two types of questions and had following design:

  • Multiple choices questions and open questions. We propose a series of binary questions (“Did the worker respected the deadline for every task?”, a requester can evaluate only with “yes” or “no” or "did the worker get a payement on time"?) and from 0 to 5 scale questions to be answered by the requester and worker after completing a task (and receiving reward), both successfully or inverse. Open question is more specified and is not disclosed.
  • After checking the cases in a visible feedback window with a series of generated questions and open question case, the results of checkboxes will be recorded (open question is only viewable to requesters and workers involved) and stored on the platform both in the requester’s and worker’s dashboard.
  • It is possible to get a feedback after a completing a certain amount work (macro tasks splitted in less big tasks), so that worker could fix the problem directly. (benefit for both parties)
  • In cases where the worker or requester gets the same « bad » evaluation of one of the qualities, « FeedbackMe » sends an automatic message to the evaluated person with proposals for further improvement.
  • For example, if a requester does not use the proper vocabulary or details for describing a task, he is sent an auto-generated message with the proposition of improving his vocabulary or be more clear when explaining tasks, etc. The exact same thing will apply to workers.
  • The feedback will be available shortly after completing (after the payment if completed correctly) the task and will be synchronous for the worker and requester.

Overall, our system offers benefits to requesters (better, clearer tasks are posted) and workers (quality improvement of delivered work).

Experiment

To demonstrate the performance of the system, we can run a test attracting 10 workers and 15 requesters. In a stage 1 we split workers into two groups: one group would provide and receive feedback to and from requesters (group S) and another who will not (group P). We assume that both groups are of same capacities and skills. Requesters are split too, into groups F, N and T (5 persons in each group). We assume that requesters of three groups give mixed (easy and complex ones) tasks. Group F gives feedback (after complete finishing) using “FeedbackMe”, while T and N do not.

Feedbackme.png


  • On the stage 2, workers do tasks of requesters. Group S and group F is receiving complete feedback after completing the task, group P and group N do not.
  • On the stage 3, both groups of workers need to complete the last worker's group tasks.
  • On the stage 4, worker's are evaluated - group T compares the results of both groups to verify if the feedback was useful to them.
  • On the stage 5, workers siwtch the group they are previously working with.
  • On the stage 6, requester are evaluated by their worker and both parties had a feedback.

After the experiment, we expect the following results:

- All the participants would have a feedback stored in the system (while passing by worker’s evaluation stage, all workers get a final feedback and while passing by requester’s evaluation stage, all requesters get a final feedback)

- Group S workers and group F requesters will answer a set of questions (multiple choices and open questions) after completing the task, and those who get a “bad evaluation” will receive an automatic message.

- Group S workers will perform better than group P workers in the final tasks (group T requesters’ tasks).

- Group F requesters will create clearer and more detailed tasks (because of mutual feedback).

References

Schulze, Thimo; Seedorf, Stefan; Geiger, David; Kaufmann, Nicolas; and Schader, Martin, "EXPLORING TASK PROPERTIES IN CROWDSOURCING – AN EMPIRICAL STUDY ON MECHANICAL TURK" (2011). ECIS 2011 Proceedings. Paper 122.

Steven P. Dow, Anand Kulkarni, Scott R. Klemmer, Björn Hartmann, "Shepherding the Crowd Yields Better Work"

Heer, J. and Bostock, M. Crowdsourcing graphical percep- tion: using mechanical turk to assess visualization design. Proc of ACM conf on Human factors in computing systems (2010), 203-212.

Aleksandrs Slivkins, Jennifer Wortman Vaughan "Online Decision Making in Crowdsourcing Markets: Theoretical Challenges" November 2013


Contributions:

Team1: @seko - Sekandar Matin & @purynova - Victoria Purynova & @kamila - Kamila Mananova