Difference between revisions of "WinterMilestone 4 Team Carpe Noctem - Improve Trust Building On Top Of Daemo"

From crowdresearch
Jump to: navigation, search
(Experiment 1: compare comprehensive rating and specific rating)
m (Use subcategory reputations in recommendation)
Line 11: Line 11:
 
General characteristics consist of the universal qualities of a worker, e.g. whether worker finishes tasks on time (timeliness), general requester satisfaction, work ethics, communication skill/style, work ethics etc. For a specific subcategory, e.g. timeliness, we look at all the tasks completed by this worker and the percentage of them within deadlines.
 
General characteristics consist of the universal qualities of a worker, e.g. whether worker finishes tasks on time (timeliness), general requester satisfaction, work ethics, communication skill/style, work ethics etc. For a specific subcategory, e.g. timeliness, we look at all the tasks completed by this worker and the percentage of them within deadlines.
 
Reputation for areas of expertise will be computed based on the number of tasks this worker completed in that area, the difficulty level of the tasks, the size/scope of the tasks, what role the user take and so on. These metrics will results in a reputation score for that particular task type. For example, if workers have been receiving good scores on web development consistently, he would be rated high in the area of web development.
 
Reputation for areas of expertise will be computed based on the number of tasks this worker completed in that area, the difficulty level of the tasks, the size/scope of the tasks, what role the user take and so on. These metrics will results in a reputation score for that particular task type. For example, if workers have been receiving good scores on web development consistently, he would be rated high in the area of web development.
 +
 +
Similar to worker reputation, requesters reputation will be divided into subcategories like task quality, response time, payment timeliness, easy-to-work-with score etc.
  
Subcategory scores will be used by computer algorithm to recommend workers to requesters and vice versa. As described above, our recommendation system will look at the requirements of the requester AND the reputation scores for workers to recommend the most likely candidate for each other.
+
Our recommendation system will look at the requirements of the requester AND the reputation scores for workers. The system uses machine learning to categorize and rank the experience level of each worker/requester, then compute a match score based on subcategory matching.
  
 
=== Experiment 1: compare comprehensive rating and specific rating ===
 
=== Experiment 1: compare comprehensive rating and specific rating ===
 
To compare, we set up a control group with only a comprehensive reputation score that is calculated as the mean of all subcategory reputation scores.   
 
To compare, we set up a control group with only a comprehensive reputation score that is calculated as the mean of all subcategory reputation scores.   
The experimental group will use our recommendation algorithm, which uses subcategory reputation and apply machine learning to access the experience level of a worker on different categories.
+
The experimental group will use our recommendation algorithm, which uses the subcategory recommendation described above.
 +
We record the satisfactory ratings from workers/requesters to each other and evaluate the results by looking at which one produce a higher satisfactory rate overall.
  
 
== Experiment 2: Feed Ranking optimization, weighted performance with focus on recent tasks ==
 
== Experiment 2: Feed Ranking optimization, weighted performance with focus on recent tasks ==

Revision as of 15:53, 8 February 2016

Standing at the central of the online labor market is the problem of trust. Without it, neither workers and requester are willing to engage in the long run. Like any offline market, trust is one of, the most important issue in building a successful, sustaining marketplace. Usually, trust between two people in the work setting is established on a history of working together and knowing each other well. Here, we propose some approaches and experiments to examine a systematic design on top of existing changes proposed in Daemo [Daemo: A Crowdsourced Crowdsourcing Platform, Stanford Crowd Research Collective]. The goal is to easy boot-strapping trust building, and maintain the trust by measures that incentivizes good behaviors and punish the bad.


Introduction

From our previous interviews and research [Daemo and among others], it is clear that we can improve the trust between workers and requesters by showing them highly-rated users and improve the feedback loop between them. In this proposal, we focus on optimizing the reputation system and feedback loop to improve the overall user experience on the marketplace.

Use subcategory reputations in recommendation

Due to a wide range of types of work, we suspect that using a single reputation score may not accurately represent a worker’s work ethics, quality of work, areas of expertise, level of experiences etc. Instead, we break down the reputation into subcategories. There are two main types of categories, general characteristic and areas of expertise. And each subcategory has its own metrics to compute the reputation score for that subcategory.

General characteristics consist of the universal qualities of a worker, e.g. whether worker finishes tasks on time (timeliness), general requester satisfaction, work ethics, communication skill/style, work ethics etc. For a specific subcategory, e.g. timeliness, we look at all the tasks completed by this worker and the percentage of them within deadlines. Reputation for areas of expertise will be computed based on the number of tasks this worker completed in that area, the difficulty level of the tasks, the size/scope of the tasks, what role the user take and so on. These metrics will results in a reputation score for that particular task type. For example, if workers have been receiving good scores on web development consistently, he would be rated high in the area of web development.

Similar to worker reputation, requesters reputation will be divided into subcategories like task quality, response time, payment timeliness, easy-to-work-with score etc.

Our recommendation system will look at the requirements of the requester AND the reputation scores for workers. The system uses machine learning to categorize and rank the experience level of each worker/requester, then compute a match score based on subcategory matching.

Experiment 1: compare comprehensive rating and specific rating

To compare, we set up a control group with only a comprehensive reputation score that is calculated as the mean of all subcategory reputation scores. The experimental group will use our recommendation algorithm, which uses the subcategory recommendation described above. We record the satisfactory ratings from workers/requesters to each other and evaluate the results by looking at which one produce a higher satisfactory rate overall.

Experiment 2: Feed Ranking optimization, weighted performance with focus on recent tasks

2 compare weighted vs. unweighted

public, private evaluation/rating incorporated into overall rating

Experiment 3: compare public and private rating

peer, client evaluation/rating incorporated into overall rating

According to studies [Shaw, Horton and Chen, CSCW ’11], the social pressure from others can make workers more cautious and motivated about his work quality. We thus inform workers that they will be evaluated by their fellow collaborators and requesters on the same tasks during the work process. The feedback will be provided (with anonymous option) to the workers during the process so they can adjust as they go instead of waiting to be corrected after mistakes have caused damage. Consistent bad feedback, i.e. no improvement after feedback is provided, will harm workers reputation score.

Experiment 4: compare public and private rating

Milestone Contributors

@lucasq