Customized Boomerang Team biubiubiu

From crowdresearch
Revision as of 10:22, 14 February 2016 by Xichen (Talk | contribs) (How to implement it?)

Jump to: navigation, search


Brief introduction of the system

Customized Boomerang is a reputation system that extends original Boomerang to a broader version. Its core idea is to rank tasks according to the Total Points (TP) of a task:

Core.png where


w (weight) reflects how important a worker thinks of an aspect of task feed, A shows how much a worker likes/dislikes the actual property of the task in that aspect and n is the number of aspects that a worker evaluates.

A is determined by two factors:

  • The explicit signals in the like/dislike lists given by a worker.
  • The implicit signals in the ratings of tasks given by a worker.

w is

  • determined by how a worker values this aspect/area. (most important, 2nd important, etc.)
  • (slightly) adjusted by the system after analyzing related user data.

How is the system solving critical problems

  • Current mainstream crowdsourcing platforms provide different task feed systems for workers. These systems, to some extent, satisfy workers with some benefits. However, none of them is able to account for various incentives of workers so that workers still have trouble in finding wanted tasks. We propose Customized Boomerang to meet diverse requirements of workers. In this system, workers have the ability to customize their own Boomerang to find wanted tasks easily. A worker can have several different customized Boomerangs in his account but can only have one active boomerang at a time. To help new workers get over the common "cold start", a new worker can preset his/her like/dislike lists of properties in the aspects that matter to him/her. In this case, even a new worker can have a satisfying task feed if he/she spends some time customizing his/her individual Boomerang. Since it is completely possible that a worker cannot decide clearly what he/she likes/dislikes or how much he/she likes/dislikes a certain property, every time when they rate a task workers can choose to allow (or not allow) the system to improve their like/dislike lists using data extracted from their ratings , which(whether to rate) totally depends on workers' willingness.
  • We expect the system to encourage multi-mode for workers, e.g. a worker can choose to focus on learning or making money using customized Boomerang according to the needs. Additionally, recommend systems used in mainstream crowdsourcing platforms can feed workers with tons of similar tasks, which drives workers bored about that quickly and nothing can be done to change that. Customized Boomerang solves this problem by providing multi-mode for workers to give workers the ability to change recommended task types.
  • Current task feed systems are not flexible to be improved since their mechanisms of recommendation are fixed at the beginning. Customized Boomerang is always open to new data that can be used to improve task feed for workers. This feature is shown in the unlimited aspects offered to workers for evaluation, thus we give this system unlimited potential to get better.. Once there's a new kind of data on the platform which some workers think it usable in refining their task feed, they can simply add that into the algorithm by deciding how important the data is to themselves so that when calculating the TP of a task, the system will take that new data into account.

Introducing modules of the system

Below, we introduce the three main steps to customize and refine a personal boomerang.

Step 1: Customization - what are your main concerns?

Problem/Limitations

The core idea of Daemo's Boomerang reputation system is to use the ratings to directly influence the work quality you get (if you are a requester) and the ease of finding quality work (if you are a worker). So that requesters and workers can provide more genuine reviews and thus help reduce reputation inflation at online crowdsourcing platforms. However, the task feed in Daemo have some drawbacks: 1) the only sorting criteria is workers' ratings to the requester, so that the task feed cannot satisfy workers since there're other criteria like task types, payment or rejection rate, etc. 2) it is hard for new workers to get a satisfying task feed. 3) there will be more and more requesters having the same cumulative rating score. Workers will find their task feed becoming chaos that many requesters, regardless workers' past ratings to them, are at the same level. Hence, they will gradually lose trust on the platform for that the only feature Daemo has (or had) is gone and workers cannot get obviously differentiated task feed.

How to implement it?

At the first step, we attempt to acquire workers' preferences. First, we have workers choose and rank aspects that matter to them from the aspect pool: since ranking boils down (for workers) to 2 things: to find good requesters (on basis of payment, rejection, etc.) and to find good tasks (interesting, matches abilities, learning something, etc.), we provide two main categories of aspects: Requester, which contains payment,rejection rate, accept time, etc. and Task, which contains task type, estimated working time, estimated hourly wage, etc. Workers can choose and rank subcategories of these two main categories.

There is no limitation of the number of aspects chosen. Workers can choose only one aspect or choose as many as they want. Then we use workers' ranked aspects to determine the value of w(weight).

Here we give an example about how to determine the value of B according to properties’s sorting. Lily chooses three properties she thinks that matter much to her and then sort them, which are payment, requester’s reputation and rejection rate. We allocate different weight to these three properties: w(payment)=8, w(requester’s reputation)=5, w(rejection rate)=3.

According to workers’ choice, we have workers to set their criteria of different properties. And we use A to denote the criteria. Workers need to set different criteria to properties according their own choice. Here, we give a simple table to explain different properties and their reponding criterias.

Juechi-table.png


For example, regarding payment, Lily needs to set a expected payment, like $6/ h. If the payment of a certain requester’s task is above that standard, we determine that this A(payment)= 0.2. Otherwise, A(payment)=-0.2. Regarding rejection rate, Lily sets 3%, however, this requester has a rejection rate 5%, then the A(rejection rate) =-0.2. Similarly, regarding requester’s reputaion, this requester who post a certain task is in Lily’s whitelist, then the A(requesters’ reputation) = 0.2.

Finally, we have the total points of this task. TP=(1+0.2)*8+(1-0.2)*5+(1+0.2)*3=17.2. As a result, we sort Lily’s task feed according to TP of different tasks from high to low.

Module 2: Photo Ranking

Problem/Limitations

Beyond harnessing local observations via Census, we wanted to demonstrate that twitch crowdsourcing could support traditional crowdsourcing tasks such as image ranking (e.g., Matchin [17]). Needfinding interviews and prototyping sessions with ten product design students at Stanford University indicated that product designers not only need photographs for their design mockups, but they also enjoy looking at the photographs. Twitch harnesses this interest to help rank photos and encourage contribution of new photos.

Module details

Photo Ranking crowdsources a ranking of stock photos for themes from a Creative Commons-licensed image library. The Twitch task displays two images related to a theme (e.g., Nature Panorama) per unlock and asks the user to slide to select the one they prefer (Figure 1). Pairwise ranking is considered faster and more accurate than rating [17]. The application regularly updates with new photos. Users can optionally contribute new photos to the database by taking a photo instead of rating one. Contributed photos must be relevant to the day’s photo theme, such as Nature Panorama, Soccer, or Beautiful Trash. Contributing a photo takes longer than the average Twitch task, but provides an opportunity for motivated individuals to enter the competition and get their photos rated. Like with Census, users receive instant feedback through a popup message to display how many other users agreed with their selection. We envision a web interface where all uploaded images can be browsed, downloaded and ranked. This data can also connect to computer vision research by providing high-quality images of object categories and scenes to create better classifiers.

Module 3: Structuring the Web

Problem/Limitations

Search engines no longer only return documents — they now aim to return direct answers [6,9]. However, despite massive undertakings such as the Google Knowledge Graph [36], Bing Satori [37] and Freebase [7], much of the knowledge on the web remains unstructured and unavailable for interactive applications. For example, searching for ‘Weird Al Yankovic born’ in a search engine such as Google returns a direct result ‘1959’ drawn from the knowledge base; however, searching for the equally relevant ‘Weird Al Yankovic first song’, ‘Weird Al Yankovic band members’, or ‘Weird Al Yankovic bestselling album’ returns a long string of documents but no direct answer, even though the answers are readily available on the performer’s Wikipedia page.

Module preview

To enable direct answers, we need structured data that is computer-readable. While crowdsourced undertakings such as Freebase and dbPedia have captured much structured data, they tend to only acquire high-level information and do not have enough contributors to achieve significant depth on any single entity. Likewise, while information extraction systems such as ReVerb [14] automatically draw such information from the text of the Wikipedia page, their error rates are currently too high to trust. Crowdsourcing can help such systems identify errors to improve future accuracy [18]. Therefore, we apply twitch crowdsourcing to produce both structured data for interactive applications and training data for information extraction systems.

Module details

Contributors to online efforts are drawn to goals that allow them to exhibit their unique expertise [2]. Thus, we allow users to help create structured data for topics of interest. The user can specify any topic on Wikipedia that they are interested in or want to learn about, for example HCI, the Godfather films, or their local city. To do so within a oneto-two second time limit, we draw on mixed-initiative information extraction systems (e.g., [18]) and ask users to help vet automatic extractions. When a user unlocks his or her phone, Structuring the Web displays a high-confidence extraction generated using ReVerb, and its source statement from the selected Wikipedia page (Figure 1). The user indicates with one swipe whether the extraction is correct with respect to the statement. ReVerb produces an extraction in SubjectRelationship-Object format: for example, if the source statement is “Stanford University was founded in 1885 by Leland Stanford as a memorial to their son”, ReVerb returns {Stanford University}, {was founded in}, {1885} and Twitch displays this structure. To minimize cognitive load and time requirements, the application filters only include short source sentences and uses color coding to match extractions with the source text. In Structuring the Web, the instant feedback upon accepting an extraction shows the user their progress growing a knowledge tree of verified facts (Figure 5). Rejecting an extraction instead scrolls the user down the article as far as their most recent extraction source, demonstrating the user’s progress in processing the article. In the future, we envision that search engines can utilize this data to answer a wider range of factual queries.