Difference between revisions of "Winter Milestone 5 Team Enigma"

From crowdresearch
Jump to: navigation, search
(Introducing modules of the system)
(Milestone Contributors)
 
(21 intermediate revisions by the same user not shown)
Line 1: Line 1:
== System (for task feed and open gov write up) ==
+
==System==
 
+
We're going to borrow systems section from this paper as an example: [[:Media:Twitch Crowdsourcing (private).pdf | Vaish R, Wyngarden K, Chen J, et al. Twitch crowdsourcing: crowd contributions in short bursts of time. Proceedings of the 32nd annual ACM conference on Human factors in computing systems. ACM, 2014: 3645-3654.]] Please note how this section was divided into different parts. Please follow the same template.
+
  
 
=== Brief introduction of the system ===  
 
=== Brief introduction of the system ===  
Line 7: Line 5:
  
 
=== How is the system solving critical problems ===  
 
=== How is the system solving critical problems ===  
Finding relevant tasks that pay out well and in time is a fundamental need for the workers. Meanwhile, finding skilled workers that deliver good quality results is requesters' aim. Boomerang aims to solve this problem by creating a feedback system that affect the users directly on the system in future. While requester reputation is the major criteria for workers trying to find tasks, there are other preferences and signals that workers look at while trying to find relevant tasks. One such indicator is the hourly wage equivalent that the task would pay. We extend the boomerang idea to include hourly rate as one of the signals for ranking tasks on the task feed. The system works by taking from workers as input, the time it takes them to complete a task. In addition to the workers' and requesters' ratings, we use the hourly wages as a way to bump up the tasks on to the task feed. Another signal for bumping up the task on the feed would be the worker's categorical rating as perceived by the requester. The bumping up of tasks will be valid only for requesters' that are in the same ratings zone, eg:- all requesters that have received '''tick +''' ratings only. This system increases interactions between workers and requesters that not only have good reputation but also keep posting tasks that are more aligned with requesters's skills and interests and pay fair wages.
+
Finding relevant tasks that pay out well and in time is a fundamental need for the workers. Meanwhile, finding skilled workers that deliver good quality results is requesters' aim. Boomerang aims to solve this problem by creating a feedback system that affect the users directly on the system in future. While requester reputation is the major criteria for workers trying to find tasks, there are other preferences and signals that workers look at while trying to find relevant tasks. One such indicator is the hourly wage equivalent that the task would pay. We extend the boomerang idea to include hourly rate as one of the signals for ranking tasks on the task feed. The system works by taking from workers as input, the time it takes them to complete a task. In addition to the workers' and requesters' ratings, we use the hourly wages as a way to bump up the tasks on to the task feed. Another signal for bumping up the task on the feed would be the worker's categorical rating as perceived by the requester. The bumping up of tasks will be valid only for requesters' that are in the same ratings zone, eg:- all requesters that are in the same ratings quartile. This system increases interactions between workers and requesters that not only have good reputation but also keep posting tasks that are more aligned with workers' skills and interests and pay fair wages.
  
The hourly wage model for the tasks on the feed will be built over the existing task completion times reported by the workers for submitted tasks. The incentive for workers to report accurate time is the accuracy of hourly wage calculated for other tasks, and effective ranking of the task on their feed.
+
The hourly wage model for the tasks on the feed will be built over the existing task completion times reported by the workers for submitted tasks. The incentive for workers to report accurate time is the availability of hourly wage calculated for other tasks, and effective ranking of tasks on their feed.
  
 
=== Introducing modules of the system ===  
 
=== Introducing modules of the system ===  
Below, we introduce the various modules of the suggested extension to Boomerang .
+
Below, we introduce the various modules of the suggested extension to Boomerang. First module is used to estimate the amount of time that a task would require. Second module introduces the concept of categorical ratings. Finally, we use these concepts to rank the tasks on the task feed.
  
=== Module 1: Census ===  
+
=== Module 1: Estimated Duration of Tasks (Hourly Wage) ===  
  
 
==== Problem/Limitations ====
 
==== Problem/Limitations ====
Despite progress in producing effective understanding of
+
The Boomerang model ensures that the problem of reputation inflation is kept in check by incentivizing users to give accurate ratings that directly affect their future activity on the crowdsourcing platform. While the overall experience of the workers with requesters might have been positive, the workers will still be benefited by a more granular preferences for the tasks showing up on their task feed. Workers find it helpful to know how much time a particular task would take before attempting that task. Moreover workers are known to track their hourly wage rates through third party extensions or scripts. The question remains how reliable are these durations reported by other workers on third party forums and platforms. There should be an incentive for the workers to report time taken to complete a task accurately.
static elements of our physical world — routes, businesses
+
and points of interest — we lack an understanding of
+
human activity. How busy is the corner cafe at 2pm on
+
Fridays? What time of day do businesspeople clear out of
+
the downtown district and get replaced by socializers?
+
Which neighborhoods keep high-energy activities going
+
until 11pm, and which ones become sleepy by 6pm? Users
+
could take advantage of this information to plan their
+
commutes, their social lives and their work.
+
  
 
==== Module preview ====
 
==== Module preview ====
Existing crowdsourced techniques such as Foursquare are
+
Currently workers depend on third party sites to find out estimated time it would take them to finish a task. These reported durations are not always accurate as there is practically no personal incentive for workers to accurately measure and report the same. We let the workers report a completion time for each task they complete. Based on these reported times, we build a model to predict the hourly wage of the individual worker for the tasks that occur on their task feed. This model takes into account the time it took for other workers to complete the same task and the time it took the same worker to complete other tasks.
too sparse to answer these kinds of questions: the answers
+
require at-the-moment, distributed human knowledge. We
+
envision that twitch crowdsourcing can help create a
+
human-centered equivalent of Google Street View, where a
+
user could browse typical crowd activity in an area. To do
+
so, we ask users to answer one of several questions about the world around them each time they unlock their phone.
+
Users can then browse the map they are helping create.
+
  
 
==== System details ====  
 
==== System details ====  
Census is the default crowdsourcing task in Twitch. It
+
*Every worker on the system will be asked to provide a input for the amount of time it took them to complete a task while submitting the same. The worker will be provided a stopwatch timer on the task screen to track the time. The timer starts once the task is accepted, and the worker can pause and resume the timer. On clicking on submit, a confirmation with recorded time is displayed to the worker. At this point the worker can either submit the recorded value or can modify the time as he/she sees fit. This also allows the workers to enter time for tasks of longer duration where they might not be on the system the entire time, or took breaks in between etc.  
collects structured information about what people
+
experience around them. Each Census unlock screen
+
consists of four to six tiles (Figures 1 and 3), each task
+
centered around questions such as:
+
• How many people are around you?
+
• What kinds of attire are nearby people wearing?
+
• What are you currently doing?
+
• How much energy do you have right now?
+
While not exhaustive, these questions cover several types of
+
information that a local census might seek to provide. Two
+
of the four questions ask users about the people around
+
them, while the other two ask about users themselves; both
+
of which they are uniquely equipped to answer. Each
+
answer is represented graphically; for example, in case of
+
activities, users have icons for working, at home, eating,
+
travelling, socializing, or exercising.
+
To motivate continued engagement, Census provides two
+
modes of feedback. Instant feedback (Figure 4) is a brief
+
Android popup message that appears immediately after the
+
user makes a selection. It reports the percentage of
+
responses in the current time bin and location that agreed
+
with the user, then fades out within two seconds. It is
+
transparent to user input, so the user can begin interacting
+
with the phone even while it is visible. Aggregated report
+
allows Twitch users to see the cumulative effect of all
+
users’ behavior. The data is bucketed and visualized on a
+
map (Figure 2) on the Twitch homepage. Users can filter
+
the data based on activity type or time of day.
+
  
 +
*The system calculates a mean completion time X for every active task on the platform. This is the mean time taken by all the workers who have attempted this task. For new tasks, the mean time is as reported on the prototype tasks phase. If the requester did not opt for a prototype task, we would simply have to wait for the first worker to attempt this task.
  
=== Module 2: Photo Ranking ===
+
*The system also calculates, for every task submitted by the worker, a signed deviation from the mean time X. All such deviations in the same task category are then averaged out to get a single negative/positive value of a mean deviation per category. Let's denote this by D(i) for category i.
 +
 
 +
*For every task on the worker's feed belonging to a given category i, we use D(i) in a linear regression model to predict the estimated time for the task. We use the simplest model by adding the above mean deviation D(i) to the mean completion time X for a task t. Thus, the Estimated Time for task t is given by ET(t) = X + D(i)
 +
 
 +
*Hourly wage can simply be calculated by dividing the reward value by the Estimated Time.
 +
 
 +
=== Module 2: Task Categories and Ratings ===
  
 
==== Problem/Limitations ====
 
==== Problem/Limitations ====
Beyond harnessing local observations via Census, we
+
While the boomerang rating captures the overall reputation of the workers and requesters on the platform, certain workers might be exceptional in one category of task than the other. The requester will benefit by making his task available to such workers leading to faster completion of the project. The workers will benefit by having work aligned to their interests show up in the task feeds. Having category specific ratings is beneficial for decisions.
wanted to demonstrate that twitch crowdsourcing could
+
support traditional crowdsourcing tasks such as image ranking (e.g., Matchin [17]). Needfinding interviews and
+
prototyping sessions with ten product design students at
+
Stanford University indicated that product designers not
+
only need photographs for their design mockups, but they
+
also enjoy looking at the photographs. Twitch harnesses
+
this interest to help rank photos and encourage contribution
+
of new photos.
+
  
 
==== Module details ====
 
==== Module details ====
Photo Ranking crowdsources a ranking of stock photos for
+
The system to maintain category ranking is simple and straightforward. Each task created by the requester is assigned an appropriate category. This is done by the requester while authoring the task. The incentive for the requester is better visibility of his task to the workers. The boomerang model asks the user to rate requester on each task. This same rating is added to corresponding category rating for the requester. Categorization of tasks is also helpful for workers to analyze and improve their skills. This categorical rating is further used to rank the tasks better in the task feed. Categories are also displayed as tags next to the tasks and clicking on the same will filter and display all tasks belonging to that category.
themes from a Creative Commons-licensed image library.
+
The Twitch task displays two images related to a theme
+
(e.g., Nature Panorama) per unlock and asks the user to
+
slide to select the one they prefer (Figure 1). Pairwise
+
ranking is considered faster and more accurate than rating
+
[17]. The application regularly updates with new photos.
+
Users can optionally contribute new photos to the database
+
by taking a photo instead of rating one. Contributed photos
+
must be relevant to the day’s photo theme, such as Nature
+
Panorama, Soccer, or Beautiful Trash. Contributing a photo
+
takes longer than the average Twitch task, but provides an
+
opportunity for motivated individuals to enter the
+
competition and get their photos rated.
+
Like with Census, users receive instant feedback through a
+
popup message to display how many other users agreed
+
with their selection. We envision a web interface where all
+
uploaded images can be browsed, downloaded and ranked.
+
This data can also connect to computer vision research by
+
providing high-quality images of object categories and
+
scenes to create better classifiers.
+
  
=== Module 3: Structuring the Web ===  
+
=== Module 3: Raking the tasks ===  
  
 
==== Problem/Limitations ====  
 
==== Problem/Limitations ====  
Search engines no longer only return documents — they
+
The boomerang ratings provides a holistic way of ranking tasks where all aspects of a given task are assumed to be implied in the ratings provided by the users. This solves the problem of reputation inflation. Extending the same concept, the proposed system aims to make the ranking better by using the previous two modules explained in this paper.
now aim to return direct answers [6,9]. However, despite
+
massive undertakings such as the Google Knowledge Graph
+
[36], Bing Satori [37] and Freebase [7], much of the
+
knowledge on the web remains unstructured and unavailable for interactive applications. For example,
+
searching for ‘Weird Al Yankovic born’ in a search engine
+
such as Google returns a direct result ‘1959’ drawn from
+
the knowledge base; however, searching for the equally
+
relevant ‘Weird Al Yankovic first song’, ‘Weird Al
+
Yankovic band members’, or ‘Weird Al Yankovic
+
bestselling album’ returns a long string of documents but no
+
direct answer, even though the answers are readily available
+
on the performer’s Wikipedia page.
+
  
 
==== Module preview ====  
 
==== Module preview ====  
To enable direct answers, we need structured data that is
+
The system improves the ranking of tasks on the worker's feed by taking into account the hourly wages as well as category specific ratings we described in the previous two modules. Along with the ratings of the past, this includes the facts (reward) and predictions (Estimated Time) from the current job to rank the tasks. The improvement over the boomerang's ranking system is to bump up the tasks for which overall ratings fall roughly within the same locality.  
computer-readable. While crowdsourced undertakings such
+
 
as Freebase and dbPedia have captured much structured
+
data, they tend to only acquire high-level information and
+
do not have enough contributors to achieve significant
+
depth on any single entity. Likewise, while information
+
extraction systems such as ReVerb [14] automatically draw
+
such information from the text of the Wikipedia page, their
+
error rates are currently too high to trust. Crowdsourcing
+
can help such systems identify errors to improve future
+
accuracy [18]. Therefore, we apply twitch crowdsourcing to
+
produce both structured data for interactive applications and
+
training data for information extraction systems.
+
  
 
==== Module details ====
 
==== Module details ====
Contributors to online efforts are drawn to goals that allow
+
The primary key for ranking tasks on the feed is boomerang ratings. The proposed system further ranks these tasks within the local group. The local groups of tasks on the feed are created based on boomerang ratings. The system breaks down the ratings into quartiles or semi-quartiles. The bumping up of tasks is allowed only for tasks belonging to the same rating quartile. Once the first level of ranking is done based on boomerang ratings, the system bumps up the tasks that belong to a category in which the worker has received higher ratings. This will lead to recalculation of the ranks, and for the bumped up tasks the category ratings are considered as they are more relevant for the given task. This is the final ranking of tasks based on any kind of rating. This is the default rank of tasks displayed to the workers.  
them to exhibit their unique expertise [2]. Thus, we allow
+
 
users to help create structured data for topics of interest.
+
In the diagrams below, the rating in black is overall rating, while the rating in Red is category ratings.
The user can specify any topic on Wikipedia that they are
+
 
interested in or want to learn about, for example HCI, the
+
<center>[[File:Ranking_step1.jpg| 500px]]    [[File:Ranking_step2.png | 500px]]</center>
Godfather films, or their local city. To do so within a oneto-two
+
 
second time limit, we draw on mixed-initiative
+
 
information extraction systems (e.g., [18]) and ask users to
+
The per hour wage equivalent for each task is displayed next to the reward awarded by the task. The task feed will have a toggle to enable wage based ranking. On toggling this on, the tasks in the same quartile rating as derived above will now be ranked based on per hour wage. We chose a quartile instead on a global wage ranking because often workers have preferred a reputable and reliable requester who they've worked with before to a possibility of a slightly higher wage from unreliable requesters. Moreover this method of a secondary local ranking maintains the importance of the Boomerang mechanism which is central to our system.
help vet automatic extractions.
+
 
When a user unlocks his or her phone, Structuring the Web
+
Tie breaker for all purposes will be the more recent task.
displays a high-confidence extraction generated using
+
 
ReVerb, and its source statement from the selected
+
<blockquote>'''Your dent in the universe, dents you.'''</blockquote>
Wikipedia page (Figure 1). The user indicates with one
+
 
swipe whether the extraction is correct with respect to the
+
==Milestone Contributors==
statement. ReVerb produces an extraction in SubjectRelationship-Object
+
Based on ideas discussed with everyone in the taskfeed brainstorming hangouts.
format: for example, if the source
+
@dineshd
statement is “Stanford University was founded in 1885 by
+
@dhankie
Leland Stanford as a memorial to their son”, ReVerb
+
returns {Stanford University}, {was founded in}, {1885}
+
and Twitch displays this structure. To minimize cognitive
+
load and time requirements, the application filters only
+
include short source sentences and uses color coding to
+
match extractions with the source text.
+
In Structuring the Web, the instant feedback upon accepting
+
an extraction shows the user their progress growing a
+
knowledge tree of verified facts (Figure 5). Rejecting an
+
extraction instead scrolls the user down the article as far as
+
their most recent extraction source, demonstrating the
+
user’s progress in processing the article. In the future, we
+
envision that search engines can utilize this data to answer a
+
wider range of factual queries.
+

Latest revision as of 08:53, 14 February 2016

System

Brief introduction of the system

There is a massive amount of information necessary for a healthy crowdsourcing marketplace — for example accurate reputation ratings, skill tags on tasks, and hourly wage estimates for tasks — that is privately held by individuals, but rarely shared. We introduce Boomerang, an interactive task feed for a crowdsourcing marketplace, that incentivizes accurate sharing of this information by making the information directly impact their future tasks or workers. Requesters' ratings of workers, and their skill classifications of tasks, are used to give early access to workers who that requester rates highly and who are experts in that skill, so giving a high rating to a mediocre worker dooms the requester to more mediocre work from that worker. Workers' ratings of requesters are used to rank their high-rated requesters at the top of the task feed, and their estimates of active work time are used to estimate their hourly wage on other tasks on the platform.

How is the system solving critical problems

Finding relevant tasks that pay out well and in time is a fundamental need for the workers. Meanwhile, finding skilled workers that deliver good quality results is requesters' aim. Boomerang aims to solve this problem by creating a feedback system that affect the users directly on the system in future. While requester reputation is the major criteria for workers trying to find tasks, there are other preferences and signals that workers look at while trying to find relevant tasks. One such indicator is the hourly wage equivalent that the task would pay. We extend the boomerang idea to include hourly rate as one of the signals for ranking tasks on the task feed. The system works by taking from workers as input, the time it takes them to complete a task. In addition to the workers' and requesters' ratings, we use the hourly wages as a way to bump up the tasks on to the task feed. Another signal for bumping up the task on the feed would be the worker's categorical rating as perceived by the requester. The bumping up of tasks will be valid only for requesters' that are in the same ratings zone, eg:- all requesters that are in the same ratings quartile. This system increases interactions between workers and requesters that not only have good reputation but also keep posting tasks that are more aligned with workers' skills and interests and pay fair wages.

The hourly wage model for the tasks on the feed will be built over the existing task completion times reported by the workers for submitted tasks. The incentive for workers to report accurate time is the availability of hourly wage calculated for other tasks, and effective ranking of tasks on their feed.

Introducing modules of the system

Below, we introduce the various modules of the suggested extension to Boomerang. First module is used to estimate the amount of time that a task would require. Second module introduces the concept of categorical ratings. Finally, we use these concepts to rank the tasks on the task feed.

Module 1: Estimated Duration of Tasks (Hourly Wage)

Problem/Limitations

The Boomerang model ensures that the problem of reputation inflation is kept in check by incentivizing users to give accurate ratings that directly affect their future activity on the crowdsourcing platform. While the overall experience of the workers with requesters might have been positive, the workers will still be benefited by a more granular preferences for the tasks showing up on their task feed. Workers find it helpful to know how much time a particular task would take before attempting that task. Moreover workers are known to track their hourly wage rates through third party extensions or scripts. The question remains how reliable are these durations reported by other workers on third party forums and platforms. There should be an incentive for the workers to report time taken to complete a task accurately.

Module preview

Currently workers depend on third party sites to find out estimated time it would take them to finish a task. These reported durations are not always accurate as there is practically no personal incentive for workers to accurately measure and report the same. We let the workers report a completion time for each task they complete. Based on these reported times, we build a model to predict the hourly wage of the individual worker for the tasks that occur on their task feed. This model takes into account the time it took for other workers to complete the same task and the time it took the same worker to complete other tasks.

System details

  • Every worker on the system will be asked to provide a input for the amount of time it took them to complete a task while submitting the same. The worker will be provided a stopwatch timer on the task screen to track the time. The timer starts once the task is accepted, and the worker can pause and resume the timer. On clicking on submit, a confirmation with recorded time is displayed to the worker. At this point the worker can either submit the recorded value or can modify the time as he/she sees fit. This also allows the workers to enter time for tasks of longer duration where they might not be on the system the entire time, or took breaks in between etc.
  • The system calculates a mean completion time X for every active task on the platform. This is the mean time taken by all the workers who have attempted this task. For new tasks, the mean time is as reported on the prototype tasks phase. If the requester did not opt for a prototype task, we would simply have to wait for the first worker to attempt this task.
  • The system also calculates, for every task submitted by the worker, a signed deviation from the mean time X. All such deviations in the same task category are then averaged out to get a single negative/positive value of a mean deviation per category. Let's denote this by D(i) for category i.
  • For every task on the worker's feed belonging to a given category i, we use D(i) in a linear regression model to predict the estimated time for the task. We use the simplest model by adding the above mean deviation D(i) to the mean completion time X for a task t. Thus, the Estimated Time for task t is given by ET(t) = X + D(i)
  • Hourly wage can simply be calculated by dividing the reward value by the Estimated Time.

Module 2: Task Categories and Ratings

Problem/Limitations

While the boomerang rating captures the overall reputation of the workers and requesters on the platform, certain workers might be exceptional in one category of task than the other. The requester will benefit by making his task available to such workers leading to faster completion of the project. The workers will benefit by having work aligned to their interests show up in the task feeds. Having category specific ratings is beneficial for decisions.

Module details

The system to maintain category ranking is simple and straightforward. Each task created by the requester is assigned an appropriate category. This is done by the requester while authoring the task. The incentive for the requester is better visibility of his task to the workers. The boomerang model asks the user to rate requester on each task. This same rating is added to corresponding category rating for the requester. Categorization of tasks is also helpful for workers to analyze and improve their skills. This categorical rating is further used to rank the tasks better in the task feed. Categories are also displayed as tags next to the tasks and clicking on the same will filter and display all tasks belonging to that category.

Module 3: Raking the tasks

Problem/Limitations

The boomerang ratings provides a holistic way of ranking tasks where all aspects of a given task are assumed to be implied in the ratings provided by the users. This solves the problem of reputation inflation. Extending the same concept, the proposed system aims to make the ranking better by using the previous two modules explained in this paper.

Module preview

The system improves the ranking of tasks on the worker's feed by taking into account the hourly wages as well as category specific ratings we described in the previous two modules. Along with the ratings of the past, this includes the facts (reward) and predictions (Estimated Time) from the current job to rank the tasks. The improvement over the boomerang's ranking system is to bump up the tasks for which overall ratings fall roughly within the same locality.


Module details

The primary key for ranking tasks on the feed is boomerang ratings. The proposed system further ranks these tasks within the local group. The local groups of tasks on the feed are created based on boomerang ratings. The system breaks down the ratings into quartiles or semi-quartiles. The bumping up of tasks is allowed only for tasks belonging to the same rating quartile. Once the first level of ranking is done based on boomerang ratings, the system bumps up the tasks that belong to a category in which the worker has received higher ratings. This will lead to recalculation of the ranks, and for the bumped up tasks the category ratings are considered as they are more relevant for the given task. This is the final ranking of tasks based on any kind of rating. This is the default rank of tasks displayed to the workers.

In the diagrams below, the rating in black is overall rating, while the rating in Red is category ratings.

Ranking step1.jpg Ranking step2.png


The per hour wage equivalent for each task is displayed next to the reward awarded by the task. The task feed will have a toggle to enable wage based ranking. On toggling this on, the tasks in the same quartile rating as derived above will now be ranked based on per hour wage. We chose a quartile instead on a global wage ranking because often workers have preferred a reputable and reliable requester who they've worked with before to a possibility of a slightly higher wage from unreliable requesters. Moreover this method of a secondary local ranking maintains the importance of the Boomerang mechanism which is central to our system.

Tie breaker for all purposes will be the more recent task.

Your dent in the universe, dents you.

Milestone Contributors

Based on ideas discussed with everyone in the taskfeed brainstorming hangouts. @dineshd @dhankie