WinterMilestone 4 biubiubiu
The topic we chose is open governance. We are trying to solve it with a system.
The problem we are solving
How to make ratings convincing and proper is a major concern for every crowdsourcing platform. Platforms that experience reputation inflation or distrust will fail since either workers or requesters don’t believe the other side. Unfortunately, popular crowdsourcing platforms like Amazon Mechanical Turk (AMT) and Upwork are having serious trust issues. Requesters on AMT will discard all results from one HIT once they find a single unusable result; workers hesitate to work for new requesters because from few ratings of a requester they cannot decide whether the requester is trustworthy and vice versa. Such disbelief will exist unless convincing predictions of a worker/requester can be made using given rating information. AMT has tried constructing mutual belief by setting qualifications for certain HITs, which, to some extent, ensures work results received by requesters. However, there’s nothing done to help workers trust requesters and existing qualification system makes workers’ reputation even more important.
To solve such trust issues, this paper proposes a reputation system where users can actually learn something specific from ratings of an unknown user. Traditional methods often discard review text, which makes users’ rating difficult to interpret, since they ignore the very text that justifies a user’s rating(File:Reviews-recsys13.pdf). Thus inspired by the reputation system of the online shop of Alibaba and TurkOption, we set up a reputation system with text reviews.
The actual system we are trying to develop contains two parts, one is rating and the other is the appealing Mechanism.
The rating system
• Encourage personalised text reviews
Why text review?
Daemo should encourage users to write down some text reviews to provide some more specific information. For example, to a bad rating from a requester, which says “Sorry to reject your work and rate you like this but the work you submitted for my task is unusable due to the misunderstanding of my instructions.”, that worker can give responsive words like “Sincerely sorry sir, my fault. Could you please trust me and give me a link to make up for my fault?”. Thus, someone unfamiliar with this worker may come up with a conclusion like” This is an honest worker who just lacks some experience. He can be trusted after some clarifications of the task.”
Specific text reviews can help to avoid misunderstanding between workers and requesters, improve the quality of work, and provide other workers/requesters detailed information to make their own judgment.
How to encourage?
1. Incentive words.
According to an experiment, Michael said in a meeting, sincere words can bring the best results. So the incentive for users to give text reviews can be words like “please give specific judgements to encourage/warn other users in the community, thank you for your valuable help” on the rating interface.
2. Experience system.
It can also be some kind of reward like users can earn some “EXP.” for level up. For example, +1 point for a good review, +0 point for mediocre and -1 point for a bad review. There will be certain thresholds for a user to level up. For example, 0 to 10 is level 1, 10 to 50 is level 2 and so on. Therefore, a user can gain high level only if he continues doing good work or provide good tasks. Also, from the level of experience, we can easily see if a user has been working for a while or is new to the platform.
3. Inner incentives.
We believe that both workers and requesters are willing to express their feelings whether they met a good person or have been treated unfairly. And our platform can provide some rewards in the operation side to formulate a good atmosphere at the beginning.
• Scoring system
1. Specified indicators in each rating.
During the rating process, an overall rating is required (good, mediocre or bad). Besides that, we can score some standardized indicators in each task. There can be some 5-star ratings for different categories, e.g. skills, attitude, completion time, accuracy rate, etc.
2. Classified ratings in different types of work.
Another important issue we consider is that the ratings should be attributed to different types of work. For instance, one worker can be a good coder, but he is not a good designer. So if we only give scores based on his overall ability( Suppose he has worked as both coder and designer before), we may lose a good coder and find him as a mediocre designer. Thus, scores in different types of work can evaluate works ability more precisely.
Additionally, we can introduce a system for both the workers and the requesters to select tasks or preferred workers based on their average ratings, level of good experience and a number of bad reviews a user has received. For example, for the workers, they can filter the tasks based on the requesters’ level of good experience such that they will only see the tasks which belong to the requesters with a good history of working with workers. For the requesters, they can set a requirement that whoever does their tasks have to have a minimum level of good experience, certain average ratings or maximum amount of bad reviews. This way, it can ensure that the workers find the best requesters they want to work for and the requesters find the best workers they want to work with.
To avoid retaliation, we implement an appealing mechanism into our reputation system. When a rejection or bad rating happens, if the worker feels that the submitted work doesn't deserve that, he can appeal. His appeal will then be crowdsourced to the platform, called workers appealing tasks or WATs. Only workers and requesters with good reputation and within the same catogory can pick up certain WATs. Workers and requesters who pick up the WATs will then make judgements about whether the rejection or bad rating is reasonable in terms of the worker’s taks results. Meanwhile, workers or requesters who complete WATs should receive payment. If the final result suggests that this rejection or bad rating is not reasonable, the requester who rejected the work should pay for this appeal task. Also, he should approve the task results and revise his review so that the worker could be treat fairly. And if the worker fails at appeal, he should pay for the appeal task.
Each WAT has a TTL (time to live). And we set TTL=3, which means, for one single appeal, at most 3 times of crowdsourcing is allowed. We allocate 3 days to the first judgement process, 1 day to the second, and half day to the final. If workers or requesters don’t agree with the judgement, he/she could appeal the task again, at most 2 times.
Slack usernames of all who helped create this wiki page submission: @frostao, @xi.chen, @juechi,@qinwei