Reputation and ranking

From crowdresearch
Jump to: navigation, search

This page describes a design and implementation strategy for Daemo's reputation system based on our brainstorms this week and over the past few months. Written up collaboratively by @msb, @aginzberg, @neilthemathguy, @dmorina, with major support from the repsys team.

Basic Concept

Design goals:

  • As few moving parts as possible, and keep as much transparent and understandable as possible. Keep a simple design. Add bells and whistles after.
  • Workers' view of their task feed should reflect their needs.
  • Requesters want to ensure they get as good workers as possible.

The core of this idea is to make it such that ratings directly influence the work quality you get (if you are a requester) and the ease of finding quality work (if you are a worker). Workers' ratings of requesters are used to rank your (workers') task feeds, and requesters' ratings of workers are used to give preferred workers earlier access to your (requesters') tasks.

This is currently revision two of the proposal based on all your feedback. It does away with the multiplicative ranking in favor of the simpler ranking and releasing model.

Basic rating mechanism

Both workers and requesters have the option to rate each other after each job. After discussion with workers and requesters, I suggest we adopt a 3-way scale used commonly in feedback and education:

  • Check-minus (√-): below expectations but still approvable
  • Check (√): meeting expectations
  • Check-plus (√+): exceeding expectations

(A "reject" or "block" mechanism would still be available for egregious cases.)

Workers rate the requester holistically on this scale based on task clarity, payment, communication skills, and anything else relevant. Requesters rate the worker on this scale holistically based on work quality, communication skills, and anything else relevant.

Workers: Ratings determine ranking

Workers' incentive to rate requesters is that the ratings are used to rank worker's own task feed. If I'm a worker, any requester whom I give √+ to gets popped to the top of my taskfeed whenever they have a task available. Task of requesters whom I give √- to goes to the bottom of my feed, and task of requesters whom I give √ to stays in the default position. So, by rating requesters, I'm sorting my feed and enabling myself to find good work in the future. This is clearly a need: see, for example, TurkAlert. For the vast majority of requesters whom I've never worked with, they will be ranked by the average rating from other workers. This approach means that workers' taskfeeds always exactly mirror the best tasks they have available, no more (from before) requesters muddying their hands.

Requesters: Ratings determine future workers

Requesters' incentive to rate workers is that the ratings determine which workers will be doing the requester's future tasks. If I'm a requester, workers I give a √+ to will get earliest access to my tasks, followed by √s, and then √-s. This approach works by releasing tasks first to the top workers, then to slightly less top workers, and so on down the chain, until everyone can see the task. Workers who I haven't rated get access in order according to their overall ratings from all requesters. This is likewise clearly a need: many requesters build up just such a pool of "private workers" on the platform who they know, trust, and have trained. This approach gives top workers a benefit for being good, gets high quality work done when you're a good-enough requester or are paying enough to deserve it, and encourages ratings from requesters.

Scenarios

It's important that we walk through scenarios here to make sure that everything is what you would expect.

Jsilver suggestion on task release orderflow (first to last): √+,√, new, √- workers

Workers' view (taskfeed): tasks from √+ requesters on top, followed by tasks from √, new, and finally √- requesters (see simple illustration below):

Daemo taskfeed orderflow.png

  • Worker got √+, Requester got √+ : The next time this requester has a task, it releases to that √+ worker first. The task appears at the top of that worker's taskfeed. Both sides are happy.
  • Worker got √+, Requester got √ : The requester releases tasks to great workers (including this √+ worker), but it shows up at the middle of the √+ worker's taskfeed (since this worker rated the requester a √).
  • Worker got √+, Requester got √- : The requester releases tasks to great workers (including this √+ worker), but it shows up at the bottom of the √+ worker's feed since the √+ worker gave the requester a √-, so the √+ worker unlikely to take it. Seems right.


  • Worker got √, Requester got √+ : Other workers (√+ workers)get first dibs on this requester's task. √ workers get to see it next.
  • Worker got √, Requester got √ : Other workers (√+ workers) get first dibs on this requester's task, and they'll take it if it's better than other things they have available (but it will be ranked in the middle). Once the task is released to this √ worker, it will likewise be ranked in the middle. Pretty middle of the road scenario, both sides have freedom and choice.
  • Worker got √, Requester got √- : The √- requester releases the task to √+ workers and then √ workers, but the task appears at the bottom of the taskfeed (of both √+ and √ workers) and this √ worker is unlikely to take it.


  • Worker got √-, Requester got √+ : The requester releases tasks first to √+ workers, then to √ workers, then finally to this √- worker only if nobody else takes it. If it makes it that far down the cascade, the task shows up at the top of the √- worker's taskfeed, since other workers love that requester.
  • Worker got √-, Requester got √ : The requester releases tasks first to √+ workers, then to √ workers, then finally to this √- worker only if nobody else takes it. The task shows up at the middle of the √- worker's taskfeed.
  • Worker got √-, Requester got √- : Requester releases tasks to √+ and √ workers first, but nobody takes it because he's a bad requester and ranked low. Eventually it releases to the √- worker, and it appears near the bottom of the √- worker's taskfeed too, since they don't like the requester. Seems reasonable.


Unless we'll give new workers and requesters a default √ rating, here are 6 more scenarios:

  • New worker, Requester got √+ : The requester releases tasks first to √+ workers by default, then to √ workers, new workers, and √- workers. The task shows up at the top of the new worker's taskfeed, since other workers love that requester.
  • New worker, Requester got √ : The requester releases tasks first to √+ workers, then to √ workers and new workers. The task shows up at the middle of the new worker's taskfeed.
  • New worker, Requester got √- : The requester releases tasks first to √+ workers, then to √ workers and new workers. The task shows up at the bottom of the new worker's taskfeed.


  • Worker got √+, New requester : The requester releases tasks first to √+ workers, then to √ workers, new workers, and √- workers. However, since the requester is new, the task shows up at the middle of the √+ worker's taskfeed (right after tasks from √ requesters).
  • Worker got √, New requester : same as above.
  • Worker got √-, New requester : same as above.


Add others, we should make sure this works...


Details

Encouraging ratings: how many do we need?

Not every worker or requester will rate each partner. With microtasks in particular, if I have 100 different workers contributing, it might take more time to rate them than to do the task! However, getting no information would doom the system, especially with new workers. Proposal: when a requester comes to get their task results, we show (3? 5? 10?) workers' submissions and ask the requester to rate this subset so that they can curate their worker pool and give appropriate feedback. If we assume that we can get 5% of workers to get feedback from each requester on average, and each worker works for 10 requesters a day on average, then in expectation they'll get a new rating every two days. That seems enough to sustain the platform.

Since there will be many workers per single requester, getting ratings on requesters may not need that kind of extra nudge.

Jsilver's suggestion: For tasks involving lots of workers, I propose that requesters should be instructed to rate a minimum of 10% of total workers involved, or a maximum of 20 workers, whichever is lower. So, if there are 100 workers involved, only 10 should be rated (10% of 100 workers). If there are 300 or 1000 workers involved, the requester only needs to rate 20. This guarantees reputation coverage. As proof of task completion, the platform can automatically give the remaining workers --workers that the requester didn't give ratings to-- a default rating (a smaller, distinct √ rating; we could think of something unique)... Or if we want 100% reputation coverage on all tasks, we can resort to peer review for rating the remaining workers. For example, if a task involved 100 workers, and the requester was only mandated to rate 10 workers, it means there are 90 workers with no rating. Those 90 would be divided into teams of 2 who would check each other's work and rate each other. In cases where the total is an odd number (e.g. 91 total workers), the platform would randomly assign a worker to rate 2 workers (instead of just one). Peer review rating would be ranked lower than rating given by requesters. I believe peer review rating is better than no rating. Peer rating Reference: http://crowdresearch.stanford.edu/w/index.php?title=Jsilver_Reputation_ideas#Peer_Feedback.2FRating

Onboarding

New workers or requesters? We would give them a default rating, for example √. In order to prevent single interactions with requesters/workers from nuking their reputation, we would pretend that 3 (or some other number) "ghost" requesters/workers had all given you √s. The effect of that anchoring at √ becomes less and less strong as you do more work, and as we gain more confidence about your "true" reputation. This is roughly equivalent to putting a Bayesian prior on your skill.

Cascading Implementation

The following describes the implementation of cascading task release and will be updated as it evolves. We determine whether a given project is available to a given worker by asking the question. Is the Worker Rating (from the requester who created the project) greater than the Minimum Rating required by the project at this moment in time? If so, then the worker will be able to see the project on his task feed.

Worker Rating

  • If the project requester has rated the worker then the worker's rating is that + 0.1 * the average rating given to that worker by other requesters.
  • If the project requester has not rated the worker, but others requesters have, then the average rating given to that worker is the worker's rating.
  • If no requester has rated the worker then the worker rating is just below √.

Minimum Rating

We compute the minimum rating using a simple temporal model. Define the following terms:

  • Elapsed time = length of time since the posting of the task
  • Project length = estimated length of time to complete the project (right now we statically estimate 1 day but this will change to incorporate project specific information, e.g. the number of tasks, the amount of time per task, the workers on the platform, etc)
  • Previous minimum rating = minimum rating calculated according to this model the last time a task feed was loaded

Computing the minimum rating occurs using 3 cases.

  • If the elapsed time is greater than the project length then the minimum rating is 0 (i.e. all workers have access to the task because we missed the deadline).
  • If the elapsed time / the project length is less than the tasks submitted / total tasks then the minimum rating remains at the previous minimum rating (we are ahead of schedule - no need to give access to more workers)
  • If the elapsed time / the project length is greater than the tasks submitted / total tasks (i.e. we are behind schedule and we need to lower the minimum rating). We update the minimum rating = previous minimum rating * (1 - (elapsed time / project length - tasks submitted / total tasks)).

The main idea behind this cascading task release implementation is the idea of lag. If the project is ahead of schedule, no need to open it to more workers. If the project is behind schedule, reduce the worker quality restrictions. If the project missed the deadline, then anyone can finish it.


Future Work

Future work: advancement

Down the road, we've talked about tiers ("tranches") of quality. In other words, we may eventually want to promote the best workers into Tier 2, where they can make more money. When we promote, we reset their scores, but now they're in a more elite class. Having tiers would allow requesters to more explicitly articulate their tradeoff between price and quality.

Future work: peer review

I'm really interested in seeing peer review used as a criteria for advancement between levels. Your work gets reviewed by people at a higher skill level, to determine whether your abilities are enough to move up.

Future work: multiple skill areas

Once we branch out to multiple skills, we'll need to discuss what happens to your reputation when you shift from one skill area to another.

Jsilver's idea: Please see PEAK and Z index and Itemized Rating [1]


API

GET /api/worker-requester-rating/

[{"id":1,"origin":{"user":1,"user_username":"adam.g","gender":"","birthday":null,"verified":false,"address":null,"nationality":[],"picture":null,"friends":[],"roles":[],"created_timestamp":"2015-08-25T07:31:13.895000Z","languages":[],"id":1},"target":2,"module":1,"weight":4.0,"type":"requester","created_timestamp":"2015-08-25T05:40:26.255000Z","last_updated":"2015-08-25T05:40:26.255000Z"},{"id":2,"origin":{"user":1,"user_username":"adam.g","gender":"","birthday":null,"verified":false,"address":null,"nationality":[],"picture":null,"friends":[],"roles":[],"created_timestamp":"2015-08-25T07:31:13.895000Z","languages":[],"id":1},"target":3,"module":1,"weight":4.0,"type":"requester","created_timestamp":"2015-08-25T05:40:26.255000Z","last_updated":"2015-08-25T05:40:26.255000Z"}]


POST /api/worker-requester-rating/

{ origin: 3 target: 2, module: 1, weight: 1, type: "requester" }


PUT /api/worker-requester-rating/

{ id: 10 weight: 2 }


GET /api/rating/requesters_reviews/

[{"id":33,"task":60,"worker":2,"task_status":2,"created_timestamp":"2015-08-26T23:18:50.569829Z","last_updated":"2015-08-26T23:24:03.721993Z","task_worker_results":[{"id":51,"template_item":2,"result":"23","status":1,"created_timestamp":"2015-08-26T23:23:46.603590Z","last_updated":"2015-08-26T23:23:46.603612Z","template_item_id":2},{"id":52,"template_item":3,"result":"12","status":1,"created_timestamp":"2015-08-26T23:23:46.616205Z","last_updated":"2015-08-26T23:23:46.616226Z","template_item_id":3}],"worker_alias":"rohit.n","task_worker_results_monitoring":[{"result":"23","template_item_id":2},{"result":"12","template_item_id":3}],"updated_delta":"2 day(s) ago","requester_alias":"adam.g","module":1,"project_name":"Microsoft Task","task_template":{"id":1,"name":"template_epNibyPQ","price":0.0,"share_with_others":false,"template_items":[{"id":1,"id_string":"id1","name":"labelid1","role":"display","icon":null,"data_source":"InputText","layout":"column","sub_type":"h4","type":"label","values":"The court upheld the sentence of another death row inmate, Robert O. Marshall, convicted of arranging the murder of his wife at a picnic area on the Garden State Parkway.","position":1},{"id":2,"id_string":"id2","name":"labeled_inputid2","role":"input","icon":null,"data_source":"ShortenedText1","layout":"column","sub_type":"h4","type":"labeled_input","values":"The court upheld the sentence of death row inmate, Robert O. Marshall, convicted of arranging the murder of his wife.","position":2,"answer":"23"},{"id":3,"id_string":"id3","name":"labeled_inputid3","role":"input","icon":null,"data_source":"ShortenedText2","layout":"column","sub_type":"h4","type":"labeled_input","values":"The court upheld the death sentence of Robert O. Marshal, convicted of arranging the murder of his wife.","position":3,"answer":"12"}]},"is_paid":false,"module_name":"Prototype Task","target":2,"current_rating":4.0,"current_rating_id":1},{"id":4,"task":3,"worker":3,"task_status":2,"created_timestamp":"2015-08-25T07:35:27.627000Z","last_updated":"2015-08-25T07:35:34.251000Z","task_worker_results":[{"id":7,"template_item":2,"result":"4","status":1,"created_timestamp":"2015-08-25T07:35:34.265000Z","last_updated":"2015-08-25T07:35:34.265000Z","template_item_id":2},{"id":8,"template_item":3,"result":"5","status":1,"created_timestamp":"2015-08-25T07:35:34.271000Z","last_updated":"2015-08-25T07:35:34.271000Z","template_item_id":3}],"worker_alias":"durim.m","task_worker_results_monitoring":[{"result":"4","template_item_id":2},{"result":"5","template_item_id":3}],"updated_delta":"4 day(s) ago","requester_alias":"adam.g","module":1,"project_name":"Microsoft Task","task_template":{"id":1,"name":"template_epNibyPQ","price":0.0,"share_with_others":false,"template_items":[{"id":1,"id_string":"id1","name":"labelid1","role":"display","icon":null,"data_source":"InputText","layout":"column","sub_type":"h4","type":"label","values":"Much of the 800 service will \" migrate to 900, \" predicts Jack Lawless, general manager of US Sprint's 900 product.","position":1},{"id":2,"id_string":"id2","name":"labeled_inputid2","role":"input","icon":null,"data_source":"ShortenedText1","layout":"column","sub_type":"h4","type":"labeled_input","values":"Manager Jack Lawless predicts much of Sprint's 800 service will \"migrate to 900.\"","position":2,"answer":"4"},{"id":3,"id_string":"id3","name":"labeled_inputid3","role":"input","icon":null,"data_source":"ShortenedText2","layout":"column","sub_type":"h4","type":"labeled_input","values":"Much of the 800 service will \" migrate to 900, \" predicts Jack Lawless.","position":3,"answer":"5"}]},"is_paid":false,"module_name":"Prototype Task","target":3,"current_rating":2.0,"current_rating_id":2}]