Milestone 7 taskforce
This is the template for Milestone 7. It is the same format as last week's research proposal.
You will propose your platform in the form of an introduction to a mock research paper. Essentially, imagine you have built your system, incorporating in all the ideas that you wanted to have in it, have run your user studies and evaluations, and everything has gone as planned. How do you convince other researchers that you have built a platform that is novel and that it is more effective at addressing problems than any existing ideas that have been attempted in the past?
An introduction of a research paper summarizes the main contributions of the research. It is generally roughly 1 page (roughly 1000 words), and consists of the following components:
Give the Crowd What they Understand Best
Improving Clarity of Crowdsourced Microtasks Automatically
Crowdsourcing is being popularly adopted by researchers and practitioners alike, in order to cater to solving several problems that go beyond the capability of machines. On the other hand, thousands of people around the world are turning to crowdsourcing, by joining the crowd workforce as a means to make their living. With the growing volume and diversity of the crowd, a natural consequence is that some crowd workers are more adept at understanding what is required from them in a given microtask when compared to others due to several reasons such as task design, familiarity with the task, language proficiency, background and most importantly, the task clarity. We believe that all crowd workers that have genuine intentions to complete microtasks successfully, should be given an adequate opportunity to do so. In order to overcome the aforementioned challenges, we propose an automatic method to improve the clarity of microtasks that are crowdsourced. In addition, our approach considers the subtle cultural backgrounds that can be used in order to further enhance the understanding of objectives in a given microtask. Through extensive experiments we show that our automatic method to improve task clarity leads to significant improvements in the performance of crowd workers.
What is the problem that you are solving, and why is it important?
- crowdsourcing aggregates diverse people from various places in the world, who have different background knowledge.
- tasks are created following the “one size fits all” approach: requesters and platform owners pay no attention to these differences, taking for granted that every potential worker will understand and interpret the information provided in the tasks (e.g. instructions, descriptions of data, forms) in the same way.
- we identify two major problems with this approach:
- first, information and interface can not be delivered in the same way for people with different cultural backgrounds.
- second, even in the same cultural background, different people have different information needs. Some workers might need to see more detailed explanations.
It is important that we give the crowd workers what they understand and can process, otherwise they will not perform correctly, they will not be satisfied, they will lose their trust in the platform and requesters will also lose their trust in crowdsourcing.
What are the existing attempts to solve this problem that have been attempted in prior research papers and real-world systems? Why are their solutions unsatisfactory?
Differences in cognition and culture may affect performance
- Cognitive Load (Sweller et al 2011), or Cognitive Dissonance theories (Aronson 1969), support the need to adopt tasks for users’ needs and backgrounds.
- Feasibility of conducting online performance evaluations of user interfaces with anonymous, unsupervised, paid participants recruited via MTurk ref
- Rating and ranking of task properties (T Schulze ECIS, 2011)
- Main UI dimensions: Information Density, Navigation, Accessibility of Functions, Guidance, Structure, Colorfulness, Saturation, Support (summarised at Reinecke et al. 2013)
- The Set of Cultural Variables and Aspects that Impact UI Preferences, Modeled in a Cultural User Model Ontology (Reinecke et al. 2013)
Simplifying complexity of textual information automatically
- Nunes, B. P., Kawase, R., Siehndel, P., Casanova, M. A., & Dietze, S. (2013, January). As simple as it gets-a sentence simplifier for different learning levels and contexts. In Advanced Learning Technologies (ICALT), 2013 IEEE 13th International Conference on (pp. 128-132). https://ricardokawase.files.wordpress.com/2013/07/nunes-icalt2013.pdf
- W. Coster and D. Kauchak. Learning to simplify sentences using wikipedia. In Proc. of the Workshop on Monolingual Text-To-Text Generation, pages 1–9, Portland, Oregon, June 2011. Association for Computational Linguistics.
Who are the workers?
- Joel Ross, Lilly Irani, M. Six Silberman, Andrew Zaldivar, and Bill Tomlinson. 2010. Who are the crowdworkers?: shifting demographics in mechanical turk. In CHI '10 Extended Abstracts on Human Factors in Computing Systems (CHI EA '10). ACM, New York, NY, USA, 2863-2872. DOI=10.1145/1753846.1753873 http://doi.acm.org/10.1145/1753846.1753873
- Mar 19, 2008 - Panos Ipeirotis blog post: "Mechanical Turk: The Demographics" - countries, gender, age, education, income, motivations
Our approach compared to other alternatives While involving workers and requesters in the process of task design might prove to be fruitful in order to ensure that requesters learn to design tasks in a worker-friendly manner, this may give rise to several limitations. Some of these are listed below:
- Time Overhead. Requesters need to wait until their task has been screened by workers, before the task can be deployed for gathering responses from the crowd. Workers need to wait before a potentially interesting task is made accessible for their participation.
- Costs Overhead. Crowd workers or mediators who are responsible for screening tasks need to be compensated in some way. Assuming, compensation is typically a monetary means, this leads to additional expenditure.
This section should lay out this the foundational idea(s). These big ideas are the things that you'll be known for, and what other platforms would want to replicate. Explain: Why/how are they novel and better than anything that has been attempted in the past?
- Culturally adaptivity. Culturally adaptive tasks will allow fully benefit not just from the crowd workers belonging to the cultural circle of the requester but also workers from all around the world. By restructuring the tasks such that they fit cultural context of workers, requesters will achieve broader access to crowd markets belonging to different cultures without being concerned how clear and properly understood they tasks are.
- Customized information delivery. The information is delivered such that it aligns workers beliefs, shared knowledge, social and cultural conventions. This, in turn, will reduce misunderstanding and confusion regarding the expected output.
- Cognitive load and misalignment control. Cultural dissonance and confusing task interface will lead to the increasing work stress that will lead to the decrease in work productivity and question the quality of work deliverables.
Before designing our approach in detail, we would perform a study to analyse the microtasks published at crowdsourcing platforms to answer questions such as:
to what extent do workers find microtasks difficult? would workers find it useful to get some additional explanations? what kind of elements become more relevant for improving clarity? how would crowd workers would like to receive such explanations? to what extent are there different workers in terms of (experience, backgrounds etc.)?
- in how many languages are current microtasks published? is there a different depending on different types of tasks (e.g. image annotation, text translation)?
how many and which types of tasks are currently published for different cultures? (check if a task of the same requester and similar structure is published in different languages language)
- do crowd workers think that there is a need for cultural adaptation for all kinds of tasks (e.g. instructions, UI)?
- which kind of elements do they find more relevant to be adapted culturally?
- which cultural dimensions are more relevant for crowd workers (e.g. language, taboos, preferences/personalization)?
- how many cultural backgrounds are there in crowdsourcing platforms?
For this study we would use Iperoitis MTurk tracker, which is currently showing live demographics of MTurk (http://demographics.mturk-tracker.com/).
Our goal is to design a hybrid system able to perform an active learning process. Namely, task translation will be done by means of automatic translation with following perfection by crowd workers. The resulting tasks will be well formulated and adopted to the needs of the target audience. An example to potential audiences might be extensive labor markets in China or India. If only crowd tasks were translated and adopted for the cultural and educational backgrounds of that markets, the participation in crowd work would increase dramatically.
The insight above should explain the high level idea (e.g., "All workers are paid in chocolate"). Here, you explain how it works in specifics. In order to overcome the aforementioned limitations, we propose an automatic approach to improve the clarity of crowdsourced microtasks. The main components in our prescribed method are : (i) a simplifier, and (ii) a task validator , and (iii) a task classifier and (iv) culture-based personalization. (i) Automatic Task Description Simplifier: In the light of the fact that several crowdsourced tasks are not restricted to particular geographic boundaries, we argue that proficiency in the English language (or other languages) of various crowd workers varies greatly. Due to this reason, crowd workers may not be adept at interpreting tasks accurately despite the presence of reasonably clear instructions. In order to avoid the potential bias generated through misinterpretations of a given task, we propose to use the Task Description Simplifier (TDS). The TDS relies on dictionaries such as WordNet in order to simplify words by replacing difficult words with more comprehensible synonyms. Difficult words are determined through NLP methods and heuristics. We can ensure that the reformulated task description does not lose its integrity in terms of fluency, and thereby the message conveyed, by automatically checking for breaches in the grammar. In the absence of difficult words in the task description, no changes are made.
(ii) Automatic Task Validator: In the next step, the reformulated task descriptions are validated in order to ensure that the task design is not malformed and is fair with respect to the workers, i,e, with respect to the time allocated for task completion, and so forth.
(iii) Automatic Task Classifier: The automatic task classifier is a supervised model that can classify a given task based on its task description and settings into one of the pre-determined classes of tasks. By doing so and annotation the task itself with a particular class , workers can easily and quickly find a task that is of their interest. This is important due to the following reasons: Workers who do not find tasks that they prefer to work on or are good at, nevertheless attempt and complete other microtasks. This prevents better qualified workers to work on the tasks, resulting in a potential loss w.r.t. to the time required for task completion and/or the quality of the results produced.
(iv) Culture-based Personalization: In the light of considerable evidence of cultural aspect being a central part of an efficient content representation, we propose to adjust crowd tasks for the cultural background of crowd workers. In line with previous research (Dieterich et al. 1993), we propose four different options for introducing adaptations: 1) the crowd-worker explicitly requests adaptations and then actively chooses or rejects them, (2) he or she explicitly requests adaptations that are then automatically triggered, (3) the system automatically recommends adaptations but lets the crowd-worker decide whether to accept or reject them or (4) the system automatically triggers adaptations. By processing personal information of crowd workers such as geolocation, spoken languages, education, past performance, and personal preference regarding introducing adaptations, our system will modify user interface and the content delivered to crowd-worker. While user interface will be adopted according to the insights gained from the literature (e.g., Reinecke et al. 2013), content adaptation will be performed through crowdsourcing the process to the culturally similar crowd-workers to verify the quality of output. Outsourcing content adaptation part will require to estimate the costs in order to decide whether the process economically sound.
Research questions to answer:
- To what extent does improving clarity automatically influence the performance of crowd workers?
- To what extent does adapting the task UI and information to different cultures influence the performance of crowd workers?
Improving task clarity
- Definition of types of information to enrich automatically
- summary of task
- descriptions of the problem statements (e.g. the text to be analyzed in a task about identifying the sentiment related to a text)
- questions and labels of form of microtask
- Types of user profiles to consider - there might be persons who need such clarifications to a major extent
- by expertise in crowdsourcing
- platform- (new, junior, senior crowd worker) or task-level (no, some or much experience in this *type of task*)
- performance: trust / performance score
- by background (we would need access to his/her profile)
- requirements-based (matches, does not match requirements)
- language knowledge
- by expertise in crowdsourcing
- Types of tasks we would like to enhance automatically
- text translation
- text annotation
- image classification
- data validation
- sentiment analysis
Adapting tasks to different cultural backgrounds
- Definition of cultural dimensions to consider
- Elements in tasks to adapt culturally
- text: title, instructions, data in problem statement, form
- style of UI (e.g. colourful with movement etc.)
- pictures: remove / hide partially what becomes sensitive to a particular culture
- List of cultures to include in the evaluation
- By Country: US, India, Spain, Israel, Germany
- Types of tasks we would like to adapt automatically
- text translation
- text annotation
- image classification
- data validation
- sentiment analysis
Two types of evaluations
- Experimental design: running several tasks in a controlled environment in which workers would work on tasks with and without the two types of adaptation we consider (further information for clarification and cultural background adaptation). We would measure the performance (i.e. accuracy with respect to a gold standard, invested time) as well as work engagement (i.e. number of tasks the workers accomplished).
- User-based evaluation: defining a survey in which we ask workers to assess their satisfaction with and without our methods. We could consider criteria that other researchers have used for user-based evaluations (e.g. in the context of recommender systems http://ucersti.ieis.tue.nl/files/papers/2.pdf). We would elaborate a survey, asking (with Likert scale-style questions and open-ended questions) about different factors such as:
- understanding of task
The reference section is where you cite prior work that you build upon. If you are aware of existing related research papers, list them here. We also encourage you to borrow ideas from the past submissions (see the meteor links above). Please list the links of the ideas you used to create this proposal (there's no restriction in terms of number of ideas or whether its yours or others'). [Foundation Idea] UI cultural adaptations were extensively researched by Reinecke and colleagues (see related work). [Feature Idea] Other teams also pointed out related ideas like http://crowdresearch.stanford.edu/w/index.php?title=Milestone_5_ABC_Storyboard:_Third_Party_Interface Human & Machine translation:
- CrowdLang: http://dl.acm.org/citation.cfm?id=2380745
- The Language Demographics of Amazon Mechanical Turk: http://mt-archive.info/ACL-2011-Zaidan.pdf