Customized Boomerang Team biubiubiu

From crowdresearch
Revision as of 19:39, 14 February 2016 by Xichen (Talk | contribs) (Module details)

Jump to: navigation, search

Brief introduction of the system

Customized Boomerang is a reputation system that extends original Boomerang to a broader version. Its core idea is to rank tasks according to the Total Points (TP) of a task:

Core.png where

w (weight) reflects how important a worker thinks of an aspect of task feed, A shows how much a worker likes/dislikes the actual property of the task in that aspect and n is the number of aspects that a worker evaluates.

A is determined by two factors:

  • The explicit signals in the like/dislike lists given by a worker.
  • The implicit signals in the ratings of tasks given by a worker.

w is

  • determined by how a worker values this aspect/area. (most important, 2nd important, etc.)
  • (slightly) adjusted by the system after analyzing related user data.

How is the system solving critical problems

  • Current mainstream crowdsourcing platforms provide different task feed systems for workers. These systems, to some extent, satisfy workers with some benefits. However, none of them is able to account for various incentives of workers so that workers still have trouble in finding wanted tasks. We propose Customized Boomerang to meet diverse requirements of workers. In this system, workers have the ability to customize their own Boomerang to find wanted tasks easily. A worker can have several different customized Boomerangs in his account but can only have one active boomerang at a time. To help new workers get over the common "cold start", a new worker can preset his/her like/dislike lists of properties in the aspects that matter to him/her. In this case, even a new worker can have a satisfying task feed if he/she spends some time customizing his/her individual Boomerang. Since it is completely possible that a worker cannot decide clearly what he/she likes/dislikes or how much he/she likes/dislikes a certain property, every time when they rate a task workers can choose to allow (or not allow) the system to improve their like/dislike lists using data extracted from their ratings , which(whether to rate) totally depends on workers' willingness.
  • We expect the system to encourage multi-mode for workers, e.g. a worker can choose to focus on learning or making money using customized Boomerang according to the needs. Additionally, recommend systems used in mainstream crowdsourcing platforms can feed workers with tons of similar tasks, which drives workers bored about that quickly and nothing can be done to change that. Customized Boomerang solves this problem by providing multi-mode for workers to give workers the ability to change recommended task types.
  • Current task feed systems are not flexible to be improved since their mechanisms of recommendation are fixed at the beginning. Customized Boomerang is always open to new data that can be used to improve task feed for workers. This feature is shown in the unlimited aspects offered to workers for evaluation, thus we give this system unlimited potential to get better.. Once there's a new kind of data on the platform which some workers think it usable in refining their task feed, they can simply add that into the algorithm by deciding how important the data is to themselves so that when calculating the TP of a task, the system will take that new data into account.

Introducing modules of the system

Below, we introduce the three main features to customize and refine a personal boomerang.

Feature 1: Customization - what are your main concerns?


The core idea of Daemo's Boomerang reputation system is to use the ratings to directly influence the work quality you get (if you are a requester) and the ease of finding quality work (if you are a worker). So that requesters and workers can provide more genuine reviews and thus help reduce reputation inflation at online crowdsourcing platforms. However, the task feed in Daemo have some drawbacks: 1) the only sorting criteria is workers' ratings to the requester, so that the task feed cannot satisfy workers since there're other criteria like task types, payment or rejection rate, etc. 2) it is hard for new workers to get a satisfying task feed. 3) there will be more and more requesters having the same cumulative rating score. Workers will find their task feed becoming chaos that many requesters, regardless workers' past ratings to them, are at the same level. Hence, they will gradually lose trust on the platform for that the only feature Daemo has (or had) is gone and workers cannot get obviously differentiated task feed.

How to customize your personal boomerang?

At the first step, we attempt to acquire workers' preferences. First, we have workers choose and rank aspects that matter to them from the aspect pool:

since ranking boils down (for workers) to 2 things: to find good requesters (on basis of payment, rejection, etc.) and to find good tasks (interesting, matches abilities, learning something, etc.), we provide two main categories of aspects: Requester, which contains subjective evaluation of a requester(which is exactly what original Boomerang includes), payment,rejection rate, accept time, quality of communication etc. and Task, which contains task type, estimated working time, estimated hourly wage, etc. Workers can choose and rank subcategories of these two main categories.

There is no limitation of the number of aspects chosen. Workers can choose only one aspect or choose as many as they want. Then we use workers' ranked aspects to determine the value of w(weight).

Here we give an example about how to determine the value of w according to aspects' ranking.

A worker Lily chooses three aspects she thinks that matter to her and then ranks them, which are estimated hourly wage, subjective evaluation of a requester and rejection rate. We allocate different weight to these three aspects: w(payment)=8, w(evaluation)=5, w(rejection rate)=3 (according to the importance Lily puts in different aspects).

  • The problem that how to determine weight is still open to discussion.

(Optional) According to workers' choice, we have workers set their like/dislike lists of different aspects, which helps new workers since they don't have past ratings for the system to guess their preferences. And we use A to denote the preferences. Workers set different lists for aspects according their own choice. Here, we give a simple table to explain different aspects and their preferences.

Hourly Wage Like List >$6/h
Dislike List <$2/h
Subjective Evaluation White List Adam, Bob ....
Black List James, Kevin...
Rejection rate Like List <3%
Dislike List >10%

For example, regarding payment, Lily needs to set an expected estimated hourly wage, like higher than $6/ h. If the payment of a certain requester’s task doesn't meet the requirement, we determine that this A(wage)= -0.2. Otherwise, A(wage)=0.2. Regarding rejection rate, Lily sets less than 3%, however, this requester has a rejection rate of 8%, then the A(rejection rate) =0. Similarly, regarding subjective evaluation , this requester, we call him Adam, is in Lily’s whitelist (this happens either because Lily sets the requester in the like list or because Lily gave a good rating of the requester), then the A(evaluation) = 0.2.

  • There can be more levels of A. And it's still open to discussion how many levels should A be divided into.

Finally, we have the Total Point of this task. TP=(1-0.2)*8+(1+0.2)*5+(1+0)*3=15.4. As a result, we rank tasks fed to Lily according to TP of different tasks from high to low.

  • If you still have trouble understanding how this works, feel free to ask any kind of questions.

Feature 2: Refine your personal boomerang through ratings


It's been proved that reputation inflation exists in traditional crowdsourcing platforms and original Boomerang on Daemo provides a solution to that: let users know that their past ratings will determine their future task feed. That is both an incentive and a warning to have users rate accurately and truly. We believe it's a really effective idea, though ratings in original Boomerang are limited only in Requesters/Workers. Here, the broader version of Boomerang, Customized Boomerang provides great freedom in ratings but still carries forward the idea behind the original Boomerang.

Module details

It's been mentioned that A is determined by two factors:1)The explicit signals in the like/dislike lists given by a worker (which is explained in Feature 1). 2)The implicit signals in the ratings of tasks given by a worker. Here's a rough view of different kinds of ratings in Customized Boomerang.

Overall Rating Use it to refine *1
Ignore this rating *2
Detailed Rating Use it to refine *3
Ignore this rating *4

Again, we uses worker Lily as an example in some cases. And here's what will happen in 4 kinds of cases: 1. An Overall Rating that helps refine personal Boomerang: When a worker finishes a task and feels comfortable about properties in every aspect he/she concerns, he/she can choose to give an overall rating and allow the system use this rating to refine his/her boomerang, the properties of the task in aspects which the worker chose and ranked will be extracted for use. If an extracted properties already exist in the worker's like/dislike list, the system upgrades/degrades that property; Otherwise, the property is added to like/dislike list according to the rating.

  • Example: Lily, whose boomerang is shown in the table in Feature 1, chooses to give an overall rating of 4/5 to the task she's done, which is a $5/h, 5min task posted by James with a rejection rate of 5%, and wants to use this rating to refine her task feed. The system extracts the needed properties: $5/h, Wade, rejection rate of 5%. Since Lily's original like list of hourly wage is >$6/h and she gives a 4/5 to a $5/h task, the system adds >$5/h into her like list, but A(when wage >$5/h) < A(when wage >$6/h). Regarding subjective evaluation of Adam, since Adam already exists in the white list, the system will define A(posted by James) < A(posted by Kevin). As for Rejection rate, the system will add <5% into her like list but still, A(when rate<5%) < A(when rate<3%).

2. An Overall Rating that the worker tells the system to ignore: When a worker finishes a task and don't want to let this rating affect his/her future task, e.g. a worker got an excellent experience through working for a task whose type the worker isn't interested in at all, he/she just wants to express his/her feelings but still don't want to be fed by such tasks, the worker can choose to only give an overall rating that won't have effects on his/her future task feed.

  • Example: After finishing a task posted by Adam, Lily feels that this time the task is terribly authored. But she knows that Adam is a good requester and maybe this time something happens to him and she wants to work for him in the future. However, in original Boomerang, if she gives a bad rating, she'll not be fed with tasks posted by Adam first. Then she can choose to give an overall rating and tell the system to ignore since she just wants to express her feelings to this single task. In this case, the system will not use the rating for refinement.

3. Some detailed Ratings that helps refine personal Boomerang: When a worker is satisfied with some properties of the task, apathetic with some properties and annoyed by some properties, he/she can choose to give detailed ratings to express such mixed feelings.

  • Example: A worker Chan(male) sets his boomerang concerns in hourly wage, working time, requester's payment and subjective evaluation of a requester. And he completes a $20/h ,40min task posted by a requester Zhou(female). Chan feels it's good to earn $14 in 40min(hourly wage is good) but a 40min task is too tiring for him(working time is not fine). What's more, the requester's payment is generous but the task she posts is a bit unclear and has pool instructions. With such mixed feelings, Chan can choose to give detailed ratings and use that to help future task feed. Only give ratings to satisfying properties or only give ratings to annoying properties or give ratings to all properties in all aspects he concerns is all cool, it's up to him.

4. Some detailed Ratings that the worker tells the system to ignore: Such ratings happen when workers want to express their feelings of some specific properties but don't want this affect their future task feed.

  • Example: That worker Chan again completes a $50/h, 5min task posted by some requester. He feels that he met an angle just like you. So he has such strong desire to express his feelings to the requester's payment and to the hourly wage. However, Chan knows most tasks with such high payment are boring and require special skills which he doesn't have. Then if his like for such high hourly wage is used in future task feed, he'll have a bunch of tasks with high wage but he can't do. This is the case to give detailed ratings and don't use it for future task feed. Chan can give 10/5 (kidding) to requester's payment and the hourly wage to express his thanks and then go back to his normal tasks with the wish to meet some tasks like that again. Additionally, he can add that angle requester into his whitelist to "subscribe".

Module 3: Structuring the Web


Search engines no longer only return documents — they now aim to return direct answers [6,9]. However, despite massive undertakings such as the Google Knowledge Graph [36], Bing Satori [37] and Freebase [7], much of the knowledge on the web remains unstructured and unavailable for interactive applications. For example, searching for ‘Weird Al Yankovic born’ in a search engine such as Google returns a direct result ‘1959’ drawn from the knowledge base; however, searching for the equally relevant ‘Weird Al Yankovic first song’, ‘Weird Al Yankovic band members’, or ‘Weird Al Yankovic bestselling album’ returns a long string of documents but no direct answer, even though the answers are readily available on the performer’s Wikipedia page.

Module preview

To enable direct answers, we need structured data that is computer-readable. While crowdsourced undertakings such as Freebase and dbPedia have captured much structured data, they tend to only acquire high-level information and do not have enough contributors to achieve significant depth on any single entity. Likewise, while information extraction systems such as ReVerb [14] automatically draw such information from the text of the Wikipedia page, their error rates are currently too high to trust. Crowdsourcing can help such systems identify errors to improve future accuracy [18]. Therefore, we apply twitch crowdsourcing to produce both structured data for interactive applications and training data for information extraction systems.

Module details

Contributors to online efforts are drawn to goals that allow them to exhibit their unique expertise [2]. Thus, we allow users to help create structured data for topics of interest. The user can specify any topic on Wikipedia that they are interested in or want to learn about, for example HCI, the Godfather films, or their local city. To do so within a oneto-two second time limit, we draw on mixed-initiative information extraction systems (e.g., [18]) and ask users to help vet automatic extractions. When a user unlocks his or her phone, Structuring the Web displays a high-confidence extraction generated using ReVerb, and its source statement from the selected Wikipedia page (Figure 1). The user indicates with one swipe whether the extraction is correct with respect to the statement. ReVerb produces an extraction in SubjectRelationship-Object format: for example, if the source statement is “Stanford University was founded in 1885 by Leland Stanford as a memorial to their son”, ReVerb returns {Stanford University}, {was founded in}, {1885} and Twitch displays this structure. To minimize cognitive load and time requirements, the application filters only include short source sentences and uses color coding to match extractions with the source text. In Structuring the Web, the instant feedback upon accepting an extraction shows the user their progress growing a knowledge tree of verified facts (Figure 5). Rejecting an extraction instead scrolls the user down the article as far as their most recent extraction source, demonstrating the user’s progress in processing the article. In the future, we envision that search engines can utilize this data to answer a wider range of factual queries.