Milestone 6 Taskforce
This is the template for Milestone 6 Research Proposal. Our research proposal is the following:
Give the Crowd What they Understand Best
Improving Clarity of Crowdsourced Microtasks Automatically
Crowdsourcing is being popularly adopted by researchers and practitioners alike, in order to cater to solving several problems that go beyond the capability of machines. On the other hand, thousands of people around the world are turning to crowdsourcing, by joining the crowd workforce as a means to make their living. With the growing volume and diversity of the crowd, a natural consequence is that some crowd workers are more adept at understanding what is required from them in a given microtask when compared to others due to several reasons such as task design, familiarity with the task, language proficiency, background and most importantly, the task clarity. We believe that all crowd workers that have genuine intentions to complete microtasks successfully, should be given an adequate opportunity to do so. In order to overcome the aforementioned challenges, we propose an automatic method to improve the clarity of microtasks that are crowdsourced. In addition, our approach considers the subtle cultural backgrounds that can be used in order to further enhance the understanding of objectives in a given microtask. Through extensive experiments we show that our automatic method to improve task clarity leads to significant improvements in the performance of crowd workers.
What is the problem that you are solving, and why is it important?
- crowdsourcing aggregates diverse people from various places in the world, who have different background knowledge.
- tasks are created following the “one size fits all” approach: requesters and platform owners pay no attention to these differences, taking for granted that every potential worker will understand and interpret the information provided in the tasks (e.g. instructions, descriptions of data, forms) in the same way.
- we identify two major problems with this approach:
- first, information and interface can not be delivered in the same way for people with different cultural backgrounds.
- second, even in the same cultural background, different people have different information needs. Some workers might need to see more detailed explanations.
It is important that we give the crowd workers what they understand and can process, otherwise they will not perform correctly, they will not be satisfied, they will lose their trust in the platform and requesters will also lose their trust in crowdsourcing.
What are the existing attempts to solve this problem that have been attempted in prior research papers and real-world systems? Why are their solutions unsatisfactory?
Differences in cognition and culture may affect performance
- Cognitive Load (Sweller et al 2011), or Cognitive Dissonance theories (Aronson 1969), support the need to adopt tasks for users’ needs and backgrounds.
- Feasibility of conducting online performance evaluations of user interfaces with anonymous, unsupervised, paid participants recruited via MTurk ref
- Rating and ranking of task properties (T Schulze ECIS, 2011)
- Main UI dimensions: Information Density, Navigation, Accessibility of Functions, Guidance, Structure, Colorfulness, Saturation, Support (summarised at Reinecke et al. 2013)
- The Set of Cultural Variables and Aspects that Impact UI Preferences, Modeled in a Cultural User Model Ontology (Reinecke et al. 2013)
Who are the workers?
- Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: shifting demographics in mechanical turk. In CHI '10 Extended Abstracts on Human Factors in Computing Systems (CHI EA '10). ACM, New York, NY, USA, 2863-2872. DOI=10.1145/1753846.1753873 http://doi.acm.org/10.1145/1753846.1753873
- Mar 19, 2008 - Panos Ipeirotis blog post: "Mechanical Turk: The Demographics" - countries, gender, age, education, income, motivations
Our approach compared to other alternatives While involving workers and requesters in the process of task design might prove to be fruitful in order to ensure that requesters learn to design tasks in a worker-friendly manner, this may give rise to several limitations. Some of these are listed below:
- Time Overhead. Requesters need to wait until their task has been screened by workers, before the task can be deployed for gathering responses from the crowd. Workers need to wait before a potentially interesting task is made accessible for their participation.
- Costs Overhead. Crowd workers or mediators who are responsible for screening tasks need to be compensated in some way. Assuming, compensation is typically a monetary means, this leads to additional expenditure.
This section should lay out this the foundational idea(s). These big ideas are the things that you'll be known for, and what other platforms would want to replicate. Explain: Why/how are they novel and better than anything that has been attempted in the past?
- Culturally adaptivity. Culturally adaptive tasks will allow fully benefit not just from the crowd workers belonging to the cultural circle of the requester but also workers from all around the world. By restructuring the tasks such that they fit cultural context of workers, requesters will achieve broader access to crowd markets belonging to different cultures without being concerned how clear and properly understood they tasks are.
- Customized information delivery. The information is delivered such that it aligns workers beliefs, shared knowledge, social and cultural conventions. This, in turn, will reduce misunderstanding and confusion regarding the expected output.
- Cognitive load and misalignment control. Cultural dissonance and confusing task interface will lead to the increasing work stress that will lead to the decrease in work productivity and question the quality of work deliverables.
The insight above should explain the high level idea (e.g., "All workers are paid in chocolate"). Here, you explain how it works in specifics.
In order to overcome the aforementioned limitations, we propose an automatic approach to improve the clarity of crowdsourced microtasks. The main components in our prescribed method are : (i) a simplifier, and (ii) a task validator , and (iii) a task classifier and (iv) culture-based personalization.
(i) Automatic Task Description Simplifier:
In the light of the fact that several crowdsourced tasks are not restricted to particular geographic boundaries, we argue that proficiency in the English language (or other languages) of various crowd workers varies greatly. Due to this reason, crowd workers may not be adept at interpreting tasks accurately despite the presence of reasonably clear instructions. In order to avoid the potential bias generated through misinterpretations of a given task, we propose to use the Task Description Simplifier (TDS). The TDS relies on dictionaries such as WordNet in order to simplify words by replacing difficult words with more comprehensible synonyms. Difficult words are determined through NLP methods and heuristics. We can ensure that the reformulated task description does not lose its integrity in terms of fluency, and thereby the message conveyed, by automatically checking for breaches in the grammar. In the absence of difficult words in the task description, no changes are made.
(ii) Automatic Task Validator:
In the next step, the reformulated task descriptions are validated in order to ensure that the task design is not malformed and is fair with respect to the workers, i,e, with respect to the time allocated for task completion, and so forth.
(iii) Automatic Task Classifier:
The automatic task classifier is a supervised model that can classify a given task based on its task description and settings into one of the pre-determined classes of tasks. By doing so and annotation the task itself with a particular class , workers can easily and quickly find a task that is of their interest. This is important due to the following reasons: Workers who do not find tasks that they prefer to work on or are good at, nevertheless attempt and complete other microtasks. This prevents better qualified workers to work on the tasks, resulting in a potential loss w.r.t. to the time required for task completion and/or the quality of the results produced.
(iv) Culture-based Personalization:
In the light of considerable evidence of cultural aspect being a central part of an efficient content representation, we propose to adjust crowd tasks for the cultural background of crowd workers. In line with previous research (Dieterich et al. 1993), we propose four different options for introducing adaptations: 1) the crowd-worker explicitly requests adaptations and then actively chooses or rejects them, (2) he or she explicitly requests adaptations that are then automatically triggered, (3) the system automatically recommends adaptations but lets the crowd-worker decide whether to accept or reject them or (4) the system automatically triggers adaptations. By processing personal information of crowd workers such as geolocation, spoken languages, education, past performance, and personal preference regarding introducing adaptations, our system will modify user interface and the content delivered to crowd-worker. While user interface will be adopted according to the insights gained from the literature (e.g., Reinecke et al. 2013), content adaptation will be performed through crowdsourcing the process to the culturally similar crowd-workers to verify the quality of output. Outsourcing content adaptation part will require to estimate the costs in order to decide whether the process economically sound.
Once the platform you propose has been implemented, how will you determine whether your system actually solves the problem you wanted to solve? What are the results you hope you can realistically achieve? Why do these results show that you have solved the problem?
Research questions to answer:
- To what extent does improving clarity automatically influence the performance of crowd workers?
- To what extent does adapting the task UI and information to different cultures influence the performance of crowd workers?
Improving task clarity
- Definition of types of information to enrich automatically
- Types of user profiles to consider - there might be persons who need such clarifications to a major extent
- Types of tasks we would like to enhance automatically
Adapting tasks to different cultural backgrounds
- Definition of cultural dimensions to consider
- List of cultures to include in the evaluation
- Types of tasks we would like to adapt automatically
Two types of evaluations
- Experimental design: running several tasks in a controlled environment in which workers would work on tasks with and without the two types of adaptation we consider (further information for clarification and cultural background adaptation). We would measure the performance (i.e. accuracy with respect to a gold standard, invested time) as well as work engagement (i.e. number of tasks the workers accomplished).
- User-based evaluation: defining a survey in which we ask workers to assess their satisfaction with our methods. We could consider criteria that other researchers have used for user-based evaluations (e.g. in the context of recommender systems http://ucersti.ieis.tue.nl/files/papers/2.pdf).
The reference section is where you cite prior work that you build upon. If you are aware of existing related research papers, list them here. We also encourage you to borrow ideas from the past submissions (see the meteor links above). Please list the links of the ideas you used to create this proposal (there's no restriction in terms of number of ideas or whether its yours or others').
- [Foundation Idea] UI cultural adaptations were extensively researched by Reinecke and colleagues (see related work).
- [Feature Idea] Other teams also pointed out related ideas like http://crowdresearch.stanford.edu/w/index.php?title=Milestone_5_ABC_Storyboard:_Third_Party_Interface