Difference between revisions of "Customized Boomerang Team biubiubiu"

From crowdresearch
Jump to: navigation, search
(How to implement it?)
(Step 1: Customization - what are your main concerns?)
Line 29: Line 29:
 
Below, we introduce the three main steps to customize and refine a personal boomerang.
 
Below, we introduce the three main steps to customize and refine a personal boomerang.
  
=== Step 1: Customization - what are your main concerns? ===  
+
=== Feature 1: Customization - what are your main concerns? ===  
  
 
==== Problem/Limitations ====
 
==== Problem/Limitations ====

Revision as of 18:26, 14 February 2016


Brief introduction of the system

Customized Boomerang is a reputation system that extends original Boomerang to a broader version. Its core idea is to rank tasks according to the Total Points (TP) of a task:

Core.png where


w (weight) reflects how important a worker thinks of an aspect of task feed, A shows how much a worker likes/dislikes the actual property of the task in that aspect and n is the number of aspects that a worker evaluates.

A is determined by two factors:

  • The explicit signals in the like/dislike lists given by a worker.
  • The implicit signals in the ratings of tasks given by a worker.

w is

  • determined by how a worker values this aspect/area. (most important, 2nd important, etc.)
  • (slightly) adjusted by the system after analyzing related user data.

How is the system solving critical problems

  • Current mainstream crowdsourcing platforms provide different task feed systems for workers. These systems, to some extent, satisfy workers with some benefits. However, none of them is able to account for various incentives of workers so that workers still have trouble in finding wanted tasks. We propose Customized Boomerang to meet diverse requirements of workers. In this system, workers have the ability to customize their own Boomerang to find wanted tasks easily. A worker can have several different customized Boomerangs in his account but can only have one active boomerang at a time. To help new workers get over the common "cold start", a new worker can preset his/her like/dislike lists of properties in the aspects that matter to him/her. In this case, even a new worker can have a satisfying task feed if he/she spends some time customizing his/her individual Boomerang. Since it is completely possible that a worker cannot decide clearly what he/she likes/dislikes or how much he/she likes/dislikes a certain property, every time when they rate a task workers can choose to allow (or not allow) the system to improve their like/dislike lists using data extracted from their ratings , which(whether to rate) totally depends on workers' willingness.
  • We expect the system to encourage multi-mode for workers, e.g. a worker can choose to focus on learning or making money using customized Boomerang according to the needs. Additionally, recommend systems used in mainstream crowdsourcing platforms can feed workers with tons of similar tasks, which drives workers bored about that quickly and nothing can be done to change that. Customized Boomerang solves this problem by providing multi-mode for workers to give workers the ability to change recommended task types.
  • Current task feed systems are not flexible to be improved since their mechanisms of recommendation are fixed at the beginning. Customized Boomerang is always open to new data that can be used to improve task feed for workers. This feature is shown in the unlimited aspects offered to workers for evaluation, thus we give this system unlimited potential to get better.. Once there's a new kind of data on the platform which some workers think it usable in refining their task feed, they can simply add that into the algorithm by deciding how important the data is to themselves so that when calculating the TP of a task, the system will take that new data into account.

Introducing modules of the system

Below, we introduce the three main steps to customize and refine a personal boomerang.

Feature 1: Customization - what are your main concerns?

Problem/Limitations

The core idea of Daemo's Boomerang reputation system is to use the ratings to directly influence the work quality you get (if you are a requester) and the ease of finding quality work (if you are a worker). So that requesters and workers can provide more genuine reviews and thus help reduce reputation inflation at online crowdsourcing platforms. However, the task feed in Daemo have some drawbacks: 1) the only sorting criteria is workers' ratings to the requester, so that the task feed cannot satisfy workers since there're other criteria like task types, payment or rejection rate, etc. 2) it is hard for new workers to get a satisfying task feed. 3) there will be more and more requesters having the same cumulative rating score. Workers will find their task feed becoming chaos that many requesters, regardless workers' past ratings to them, are at the same level. Hence, they will gradually lose trust on the platform for that the only feature Daemo has (or had) is gone and workers cannot get obviously differentiated task feed.

How to implement it?

At the first step, we attempt to acquire workers' preferences. First, we have workers choose and rank aspects that matter to them from the aspect pool:

since ranking boils down (for workers) to 2 things: to find good requesters (on basis of payment, rejection, etc.) and to find good tasks (interesting, matches abilities, learning something, etc.), we provide two main categories of aspects: Requester, which contains subjective evaluation of a requester, payment,rejection rate, accept time, quality of communication etc. and Task, which contains task type, estimated working time, estimated hourly wage, etc. Workers can choose and rank subcategories of these two main categories.

There is no limitation of the number of aspects chosen. Workers can choose only one aspect or choose as many as they want. Then we use workers' ranked aspects to determine the value of w(weight).

Here we give an example about how to determine the value of w according to aspects' ranking.

A worker Lily chooses three aspects she thinks that matter to her and then ranks them, which are estimated hourly wage, subjective evaluation of a requester and rejection rate. We allocate different weight to these three aspects: w(payment)=8, w(evaluation)=5, w(rejection rate)=3 (according to the importance Lily puts in different aspects).

  • The problem that how to determine weight is still open to discussion.

(Optional) According to workers' choice, we have workers set their like/dislike lists of different aspects, which helps new workers since they don't have past ratings for the system to guess their preferences. And we use A to denote the preferences. Workers set different lists for aspects according their own choice. Here, we give a simple table to explain different aspects and their preferences.

Juechi-table.png

Hourly Wage Like List >$6/h
Dislike List <$2/h
Subjective Evaluation White List Adam, Bob ....
Black List James, Kevin...
Rejection rate Like List <3%
Dislike List >10%

For example, regarding payment, Lily needs to set an expected estimated hourly wage, like higher than $6/ h. If the payment of a certain requester’s task doesn't meet the requirement, we determine that this A(wage)= -0.2. Otherwise, A(wage)=0.2. Regarding rejection rate, Lily sets less than 3%, however, this requester has a rejection rate of 8%, then the A(rejection rate) =0. Similarly, regarding subjective evaluation , this requester, we call him Adam, is in Lily’s whitelist (this happens either because Lily sets the requester in the like list or because Lily gave a good rating of the requester), then the A(evaluation) = 0.2.

  • There can be more levels of A. And it's still open to discussion how many levels should A be divided into.

Finally, we have the Total Point of this task. TP=(1-0.2)*8+(1+0.2)*5+(1+0)*3=15.4. As a result, we rank tasks fed to Lily according to TP of different tasks from high to low.

  • If you still have trouble understanding how this works, feel free to ask any kind of questions.

Module 2: Photo Ranking

Problem/Limitations

Beyond harnessing local observations via Census, we wanted to demonstrate that twitch crowdsourcing could support traditional crowdsourcing tasks such as image ranking (e.g., Matchin [17]). Needfinding interviews and prototyping sessions with ten product design students at Stanford University indicated that product designers not only need photographs for their design mockups, but they also enjoy looking at the photographs. Twitch harnesses this interest to help rank photos and encourage contribution of new photos.

Module details

Photo Ranking crowdsources a ranking of stock photos for themes from a Creative Commons-licensed image library. The Twitch task displays two images related to a theme (e.g., Nature Panorama) per unlock and asks the user to slide to select the one they prefer (Figure 1). Pairwise ranking is considered faster and more accurate than rating [17]. The application regularly updates with new photos. Users can optionally contribute new photos to the database by taking a photo instead of rating one. Contributed photos must be relevant to the day’s photo theme, such as Nature Panorama, Soccer, or Beautiful Trash. Contributing a photo takes longer than the average Twitch task, but provides an opportunity for motivated individuals to enter the competition and get their photos rated. Like with Census, users receive instant feedback through a popup message to display how many other users agreed with their selection. We envision a web interface where all uploaded images can be browsed, downloaded and ranked. This data can also connect to computer vision research by providing high-quality images of object categories and scenes to create better classifiers.

Module 3: Structuring the Web

Problem/Limitations

Search engines no longer only return documents — they now aim to return direct answers [6,9]. However, despite massive undertakings such as the Google Knowledge Graph [36], Bing Satori [37] and Freebase [7], much of the knowledge on the web remains unstructured and unavailable for interactive applications. For example, searching for ‘Weird Al Yankovic born’ in a search engine such as Google returns a direct result ‘1959’ drawn from the knowledge base; however, searching for the equally relevant ‘Weird Al Yankovic first song’, ‘Weird Al Yankovic band members’, or ‘Weird Al Yankovic bestselling album’ returns a long string of documents but no direct answer, even though the answers are readily available on the performer’s Wikipedia page.

Module preview

To enable direct answers, we need structured data that is computer-readable. While crowdsourced undertakings such as Freebase and dbPedia have captured much structured data, they tend to only acquire high-level information and do not have enough contributors to achieve significant depth on any single entity. Likewise, while information extraction systems such as ReVerb [14] automatically draw such information from the text of the Wikipedia page, their error rates are currently too high to trust. Crowdsourcing can help such systems identify errors to improve future accuracy [18]. Therefore, we apply twitch crowdsourcing to produce both structured data for interactive applications and training data for information extraction systems.

Module details

Contributors to online efforts are drawn to goals that allow them to exhibit their unique expertise [2]. Thus, we allow users to help create structured data for topics of interest. The user can specify any topic on Wikipedia that they are interested in or want to learn about, for example HCI, the Godfather films, or their local city. To do so within a oneto-two second time limit, we draw on mixed-initiative information extraction systems (e.g., [18]) and ask users to help vet automatic extractions. When a user unlocks his or her phone, Structuring the Web displays a high-confidence extraction generated using ReVerb, and its source statement from the selected Wikipedia page (Figure 1). The user indicates with one swipe whether the extraction is correct with respect to the statement. ReVerb produces an extraction in SubjectRelationship-Object format: for example, if the source statement is “Stanford University was founded in 1885 by Leland Stanford as a memorial to their son”, ReVerb returns {Stanford University}, {was founded in}, {1885} and Twitch displays this structure. To minimize cognitive load and time requirements, the application filters only include short source sentences and uses color coding to match extractions with the source text. In Structuring the Web, the instant feedback upon accepting an extraction shows the user their progress growing a knowledge tree of verified facts (Figure 5). Rejecting an extraction instead scrolls the user down the article as far as their most recent extraction source, demonstrating the user’s progress in processing the article. In the future, we envision that search engines can utilize this data to answer a wider range of factual queries.