Reputation System Rcompton
I have looked outside the realm of crowdsourcing reputation systems to try to get a outside view and possibly some inspiration. Most of the systems I looked at won't be a perfect fit, but they could provide some benefits to adapting it to our problem and work in conjunction with other proposed ideas. I'm not trying to deviate previous posts on reputation, I just wanted to provide some different ideas instead of reiterating over them.
- Adapt reputation systems from gaming to account for average skill. Allows for a comparison of skills rather than a preset determination of skill by us, the developers
- In addition to a feedback type system, incorporate an implicit feedback system as mentioned in the paper: WorkerRank: Using Employer Implicit Judgements To Infer Worker Reputation
Adapting gaming reputation systems
The key concept by most gaming reputation systems is that you can't get an explicit measure of one's skill. As we have already struggled with, we try to define specific categories of skills that workers can "level up" in. However how do you know what skills to begin with and if we do create such a level system, do we then have to create a new level structure for all skills to come and give a understandable interpretation of what these levels mean. Having the system do this for us seems to be a decent solution, but how can this be automated?
Gaming systems solve this problem by having a relative measure toward all players. Essentially, reputation is normalized across all players so you don't exactly know how people play, but you do know that what they are doing is relatively better than the majority. This type of system is incorporated within Microsoft and Chess. In relation to how Mechanical Turk gives out reputation, this seems fairly easy to adapt as a "win" would simply be accepted work and using this same system, we can have workers simply rate requesters by saying if they would be willing to work with them again or not.
How reputation is changed depends very much of the reputation of the reviewer. So for example the amount of increased reputation a worker receives when their work is accepted is dependent on the reputation of the requester and vice versa. This allows for a larger voice in the system if you have a good reputation and a weaker influence if you have a bad reputation.
These systems solve the incoming member issue by simply placing them at the average score. This makes sense since there is no information about there skill so there is no need to punish or reward them for such. However in a working environment you do want to avoid people getting credit when there is no demonstration they have a skill. To overcome this, we could have workers also rate how much effort (or Difficulty) they had to put forth in order to accomplish the work and weight that rating based on their reputation and the reputation of the requester. We can use this information to give more interpretation into skill levels and what skill a requester is looking for in order to get work done, that way they know if the task they are looking to accomplish only needs people with skills that are at least minus one standard deviation from the mean and up will know how much effort on average each skill level takes to accomplish the task and get a better idea of the cost (Cost being weighted between time of finishing the task, and cost of that worker's time based on their skill level).
We are obviously dealing with a little more complex problem than simple gaming skill and such a system doesn't solve all the problems we face. However this system could work in conjunction with user feedback and provide more automatic rating and rely less on the developers(us) in having a good knowledge base in all possible skills Daemo will need to be ready for.
- Importance of accounts being singular
- Users could reset score if they simply make a new account
- As stated in meetings before we could solve this by making it really hard to create a new account (such as AMT asking for a social security number)
- Relative to the mean scoring is hard to interpret for people unfamiliar with the bell-curve
- Skill level doesn't directly related to what can be done (Python developers don't necessarily know how to do django development)
- Would have to rely on more specific skill categorizing rather than broad labels such as programming
- Older gaming reputation systems can actually discourage people from playing in order to protect their rating. However this may not translate over to crowdsourcing as money is (probably?) a higher motivator in this context than reputation and taking a job doesn't necessarily mean that someone is going to lose
- If average skill is high, then how would we know?
- Idea of having workers rate the amount of effort and time it took to finish a job can give some information to help requestors know what the average skill level means. To reword that, having an effort score can show that maybe the average is too good of a skill requesters are looking for.
- Is an averaged skilled worker a new worker or just a decent skilled worker?
- Modified gaming systems have been made to account for the amount of games played, hence we can account for the amount of work accomplished and give more weight to those who accomplish more than those who haven't worked at all How I won the “Chess Ratings:Elo vs the rest of the world” Competition
- Provides weight to who is doing the ratings
- Skill represents relative skill instead of predetermined levels
- Do we really know the levels of skill for all skills?
- Automatic rating of skills
- Not dependent on workers or requesters taking the time to fill over evaluations
- Can work in conjunction with such a system
As outline within WorkerRank: Using Employer Implicit Judgements To Infer Worker Reputation ", reputation scores are usually skewed towards high ratings, because employers care about the impact of their feedbacks on the workers’ future opportunities for jobs in the marketplace. The skewed distribution of ratings makes them less helpful in identifying very competent workers."
The authors propose incorporating reputation judgements made at the application level rather than at the feedback after the job completion. As I think this is an interesting system and would believe it to be worth exploring, instead of reiterating over the content of the paper I was also thinking that this feedback system may work well with our brainstorming of milestone 0 type tasks. As milestone 0 tasks may not be be financially beneficial for workers, but it could be reputationally beneficial. This can provide extra benefits for workers who may believe work to be restricted by milestone 0 tasks are not worth the effort. This may be able to push for workers to try for larger tasks right away instead of looking for ting task to boost their reputation.