Difference between revisions of "Winter Milestone 5 Templates"

From crowdresearch
Jump to: navigation, search
Line 111: Line 111:
We're going to borrow introduction section from this paper as an example: [[:Media:2015 eta (private).pdf | Cheng, J., Teevan, J. & Bernstein, M.S. (2015). Measuring Crowdsourcing Effort with Error-Time Curves. CHI 2015.]]. Please note how this section was divided into different parts. Please follow the same template.
We're going to borrow introduction section from this paper as an example: [[:Media:2015 eta (private).pdf | Cheng, J., Teevan, J. & Bernstein, M.S. (2015). Measuring Crowdsourcing Effort with Error-Time Curves. CHI 2015.]]. Please note how this section was divided into different parts. Please follow the same template.
[[File:Screen Shot 2016-02-01 at 8.43.43 PM.png|Science]]
=== Phenomenon you're interested in ===
=== Phenomenon you're interested in ===

Revision as of 23:58, 8 February 2016

Please use the following template to write up your introduction section this week.


We're going to borrow introduction section from this paper as an example: Vaish R, Wyngarden K, Chen J, et al. Twitch crowdsourcing: crowd contributions in short bursts of time. Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, 2014: 3645-3654. Please note how this section was divided into different parts. Please follow the same template.


Specific problem being solved (not just crowdsourcing, but getting into specifics)

Mobilizing participation is a central challenge for every crowdsourcing campaign. Campaigns that cannot motivate enough participants will fail. Unfortunately, many interested contributors simply cannot find enough time: lack of time is the top reason that subject experts do not contribute to Wikipedia. Those who do participate in crowdsourcing campaigns often drop out when life becomes busy. Even seemingly small time requirements can dissuade users: psychologists define channel factors as the small but critical barriers to action that have a disproportionate effect on whether people complete a goal.


Despite this constraint, many crowdsourcing campaigns assume that participants will work for minutes or hours at once, leading to a granularity problem where task size is poorly matched to contributors’ opportunities. We speculate that a great number of crowdsourcing campaigns will struggle to succeed as long as potential contributors are deterred by the time commitment.

Introducing the concept, the high level insight

To engage a wider set of crowdsourcing contributors, we introduce twitch crowdsourcing: interfaces that encourage contributions of a few seconds at a time. Taking advantage of the common habit of turning to mobile phones in spare moments, we replace the mobile phone unlock screen with a brief crowdsourcing task, allowing each user to make small, compounded volunteer contributions over time. In contrast, existing mobile crowdsourcing platforms (e.g., [12,16,22]) tend to assume long, focused runs of work. Our design challenge is thus to create crowdsourcing tasks that operate in very short time periods and at low cognitive load.

The system

To demonstrate the opportunities of twitch crowdsourcing, we present Twitch, a crowdsourcing platform for Android devices that augments the unlock screen with 1–3 second volunteer crowdsourcing tasks (Figure 1). Rather than a typical slide-to-unlock mechanism, the user unlocks their phone by completing a brief crowdsourcing task. Twitch is publicly deployed and has collected over eleven thousand volunteer contributions to date. The system sits aside any existing security passcodes on the phone.

System details

Twitch crowdsourcing allows designers to tap into local and topical expertise from mobile users. Twitch supports three unlock applications: 1) Census envisions a realtime people-centered world census: where people are, what they are doing, and how they are doing it. For example, how busy is the corner café at 2pm on Fridays? Census answers these questions by asking users to share information about their surroundings as they navigate the physical world, for example the size of the crowd or current activities (Figure 1). 2) Photo Ranking captures users’ opinion between two photographs. In formative work with product designers, we found that they require stock photos for mockups, but stock photo sites have sparse ratings. Likewise, computer vision needs more data to identify high-quality images from the web. Photo Ranking (Figure 1) asks users to swipe to choose the better of two stock photos on a theme, or contribute their own through their cell phone camera.

3) Structuring the Web helps transform the written web into a format that computers can understand. Users specify an area of expertise — HCI, the Doctor Who television series, or anything else of interest on Wikipedia — and help verify web extractions relevant to that topic. Each unlock involves confirming or rejecting a short extraction. In doing so, users could power a fact-oriented search engine that would directly answer queries like “heuristic evaluation creator”. After making a selection, Twitch users can see whether their peers agreed with their selection. In addition, they can see how their contribution is contributing to the larger whole, for example aggregate responses on a map (Figure 2) or in a fact database (Figure 5).

Evaluation method and results

We deployed Twitch publicly on the web and attracted 82 users to install Twitch on their primary phones. Over three weeks, the average user unlocked their phone using Twitch 19 times per day. Users contributed over 11,000 items to our crowdsourced database, covering several cities with local census information. The median Census task unlock took 1.6 seconds, compared to 1.4 seconds for a standard slide-to-unlock gesture. Secondary task studies demonstrated that Twitch unlocks added minimal cognitive load to the user. Our work indicates that it may be possible to engage a broad set of new participants in crowdsourcing campaigns as they go about their day or have a few spare moments. In the following sections, we introduce twitch crowdsourcing in more detail and report on our public deployment and field experiments.


We're going to borrow introduction section from this paper as an example: Cheng, J., Teevan, J. & Bernstein, M.S. (2015). Measuring Crowdsourcing Effort with Error-Time Curves. CHI 2015.. Please note how this section was divided into different parts. Please follow the same template.


Phenomenon you're interested in

Imagine that a requester wants to use Amazon Mechanical Turk to label 10,000 images with a fixed set of tags. How much should workers be paid to label each image? Would labeling an image with twice as many tags result in a task that is twice as much effort? Should the tags be provided in a drop down list or with radio buttons? Answering these questions requires a fine-grained understanding of the amount of effort the task requires. This process today involves trial and error: requesters observe the wait time and quality on test tasks, guess what might have been causing any problems, tweak the task, and repeat. An accurate measure of the effort required to complete a crowdsourced task would enable requesters to

compare different approaches to their tasks, iterate toward a better design, and price their tasks objectively. It could also help workers decide whether to accept a task, or even allow systems to offer tasks based on difficulty or time availability. However, despite its potential value, task effort is challenging to estimate. Workers face cognitive biases in assessing diffi- culty [21], while requesters cannot easily observe the process and, as experts, categorically underestimate completion times [12]. These limits suggest the need for a behavioral approach to measure effort. One approach might be to let the market identify hard tasks by reacting to the posted price [30].

The puzzle (observations we can't account for yet)

However, prices cannot easily make fine distinctions in an inelastic market such as Mechanical Turk [14]. Another approach might be to use task duration as a signal of difficulty, but this is unreliable because workers regularly accept multiple tasks simultaneously and interleave work [29]. Measures such as reaction time [32] are not easy to apply to typical crowd tasks: reaction time metrics tend to use simplistic tasks (e.g., shape or color recognition), while others may be too involved for crowd work (e.g., [9]).

Experimental design

In this paper, we propose a data-driven behavioral measure of effort that can be easily and cheaply calculated using the crowd. Our metric, the error time area (ETA), draws on cognitive psychology literature on speed-accuracy tradeoff curves [32], and represents the effort required for a worker to accurately complete a task. To create it, we first recruit workers to complete the task under different time limits. Next, we fit a curve to the collected data relating the error rate and time limit (Figure 1). Last, we compute ETA by taking the area under this error-time curve. Because ETA is calculated using a data-driven approach, task difficulty can be determined with minimal effort and without analytical modeling. Rather than measuring average duration independent of work quality, ETA computes quality as a function of duration and thus can be used to estimate a wage for a task. ETA also allows requesters to compare multiple task designs; for example, we find that tagging an image with an open textbox is less effort than choosing between a fixed list of 16 options, but more effort than choosing between a fixed list of 8 options.

Evaluation methods

After describing ETA, we explore the metric via four studies: – Study 1: ETA vs. other measures of effort. For ten common microtasking primitives (e.g., multiple choice questions, long-form text entry), we show that the ETA metric represents effort better than existing measures. – Study 2: ETA vs. market price. We then compare ETA as well as other measures to the market prices of these primitives on a crowdsourcing platform. – Study 3: Modeling perceptual costs. By augmenting ETA with measures of perceptual effort, we find we can better model a worker’s perceived difficulty of a task. – Study 4: Tasks without ground truth. In order to capture how well people do a task, ETA requires ground truth. We extend the metric to also work for subjective tasks.

Result (what you'd imagine would happen)

We then demonstrate how ETA can be used for rapidly prototyping tasks. ETA makes it possible to characterize tasks in terms of their monetary cost and human effort, and paves the way for better task design, payment, and allocation.