Difference between revisions of "WinterMilestone 4 Team Carpe Noctem - Improve Trust Building On Top Of Daemo"

From crowdresearch
Jump to: navigation, search
(Created page with "Standing at the central of the online labor market is the problem of trust. Without it, neither workers and requester are willing to engage in the long run. Like any offline m...")
 
Line 1: Line 1:
Standing at the central of the online labor market is the problem of trust. Without it, neither workers and requester are willing to engage in the long run. Like any offline market, trust is one of, if not, the most important issue in building a successful, sustaining marketplace. Usually, trust between two people in the work setting is established on a history of working together and knowing each other well. Here, we propose some methodologies and experiments to examine a better system design that boost trust forming,incentivize good behaviors and punish the bad.
+
Standing at the central of the online labor market is the problem of trust. Without it, neither workers and requester are willing to engage in the long run. Like any offline market, trust is one of, the most important issue in building a successful, sustaining marketplace. Usually, trust between two people in the work setting is established on a history of working together and knowing each other well. Here, we propose some approaches and experiments to examine a systematic design on top of existing changes proposed in Daemo [Daemo: A Crowdsourced Crowdsourcing Platform, Stanford Crowd Research Collective]. The goal is to easy boot-strapping trust building, and maintain the trust by measures that incentivizes good behaviors and punish the bad.
  
  
 +
== Introduction  ==
  
== INTRODUCTION  ==
+
From our previous interviews and research [Daemo and among others], it is clear that we can improve the trust between workers and requesters by showing them highly-rated users and improve the feedback loop between them. In this proposal, we focus on optimizing the reputation system and feedback loop to improve the overall user experience on the marketplace.
  
From our previous interviews and research [Daemo and among others], it is clear that we can improve  In this proposal, we dive deep into some potential methods to improve building trust and design experiments to test these theories. Specifically,
+
== Use subcategory reputations in recommendation ==
 +
Due to a wide range of types of work, we suspect that using a single reputation score may not accurately represent a worker’s work ethics, quality of work, areas of expertise, level of experiences etc. Instead, we break down the reputation into subcategories. There are two main types of categories, general characteristic and areas of expertise. And each subcategory has its own metrics to compute the reputation score for that subcategory.
  
== Hypothesis ==
+
General characteristics consist of the universal qualities of a worker, e.g. whether worker finishes tasks on time (timeliness), general requester satisfaction, work ethics, communication skill/style, work ethics etc. For a specific subcategory, e.g. timeliness, we look at all the tasks completed by this worker and the percentage of them within deadlines.
 +
Reputation for areas of expertise will be computed based on the number of tasks this worker completed in that area, the difficulty level of the tasks, the size/scope of the tasks, what role the user take and so on. These metrics will results in a reputation score for that particular task type. For example, if workers have been receiving good scores on web development consistently, he would be rated high in the area of web development.
  
=== Requesters ===
+
Subcategory scores will be used by computer algorithm to recommend workers to requesters and vice versa. As described above, our recommendation system will look at the requirements of the requester AND the reputation scores for workers to recommend the most likely candidate for each other.
  
=== Workers ===
+
=== Experiment 1: compare comprehensive rating and specific rating ===
 +
To compare, we set up a control group with only a comprehensive reputation score that is calculated as a  can be provided based on general worker characteristics as well as areas of expertise. This score is used as an overall reputation for humans but when recommending requesters with workers, the computer will use the subcategory reputation in its algorithm.
 +
Similar to worker reputation, requester reputation will be divided into subcategories like task quality, response time, payment timeliness, easy-to-work-with score etc. Like worker reputation, a comprehensive reputation is generated for humans, while the actually recommendation is based on subcategory reputation.
  
== Experiment Design ==
+
== Experiment 2: Feed Ranking optimization, weighted performance with focus on recent tasks ==
 +
=== 2 compare weighted vs. unweighted ===
  
=== Part 1 ===
+
== public, private evaluation/rating incorporated into overall rating ==
 +
=== Experiment 3: compare public and private rating ===
  
=== Part 2 ===
+
== peer, client evaluation/rating incorporated into overall rating ==
 +
According to studies [Shaw, Horton and Chen, CSCW ’11], the social pressure from others can make workers more cautious and motivated about his work quality. We thus inform workers that they will be evaluated by their fellow collaborators and requesters on the same tasks during the work process. The feedback will be provided (with anonymous option) to the workers during the process so they can adjust as they go instead of waiting to be corrected after mistakes have caused damage. Consistent bad feedback, i.e. no improvement after feedback is provided, will harm workers reputation score.
 +
=== Experiment 4: compare public and private rating ===
  
=== Extension to Boomerang Platform ===
 
 
== Interpreting the Results ==
 
  
  
 
== Milestone Contributors ==
 
== Milestone Contributors ==
 
 
@lucasq
 
@lucasq

Revision as of 15:40, 8 February 2016

Standing at the central of the online labor market is the problem of trust. Without it, neither workers and requester are willing to engage in the long run. Like any offline market, trust is one of, the most important issue in building a successful, sustaining marketplace. Usually, trust between two people in the work setting is established on a history of working together and knowing each other well. Here, we propose some approaches and experiments to examine a systematic design on top of existing changes proposed in Daemo [Daemo: A Crowdsourced Crowdsourcing Platform, Stanford Crowd Research Collective]. The goal is to easy boot-strapping trust building, and maintain the trust by measures that incentivizes good behaviors and punish the bad.


Introduction

From our previous interviews and research [Daemo and among others], it is clear that we can improve the trust between workers and requesters by showing them highly-rated users and improve the feedback loop between them. In this proposal, we focus on optimizing the reputation system and feedback loop to improve the overall user experience on the marketplace.

Use subcategory reputations in recommendation

Due to a wide range of types of work, we suspect that using a single reputation score may not accurately represent a worker’s work ethics, quality of work, areas of expertise, level of experiences etc. Instead, we break down the reputation into subcategories. There are two main types of categories, general characteristic and areas of expertise. And each subcategory has its own metrics to compute the reputation score for that subcategory.

General characteristics consist of the universal qualities of a worker, e.g. whether worker finishes tasks on time (timeliness), general requester satisfaction, work ethics, communication skill/style, work ethics etc. For a specific subcategory, e.g. timeliness, we look at all the tasks completed by this worker and the percentage of them within deadlines. Reputation for areas of expertise will be computed based on the number of tasks this worker completed in that area, the difficulty level of the tasks, the size/scope of the tasks, what role the user take and so on. These metrics will results in a reputation score for that particular task type. For example, if workers have been receiving good scores on web development consistently, he would be rated high in the area of web development.

Subcategory scores will be used by computer algorithm to recommend workers to requesters and vice versa. As described above, our recommendation system will look at the requirements of the requester AND the reputation scores for workers to recommend the most likely candidate for each other.

Experiment 1: compare comprehensive rating and specific rating

To compare, we set up a control group with only a comprehensive reputation score that is calculated as a can be provided based on general worker characteristics as well as areas of expertise. This score is used as an overall reputation for humans but when recommending requesters with workers, the computer will use the subcategory reputation in its algorithm. Similar to worker reputation, requester reputation will be divided into subcategories like task quality, response time, payment timeliness, easy-to-work-with score etc. Like worker reputation, a comprehensive reputation is generated for humans, while the actually recommendation is based on subcategory reputation.

Experiment 2: Feed Ranking optimization, weighted performance with focus on recent tasks

2 compare weighted vs. unweighted

public, private evaluation/rating incorporated into overall rating

Experiment 3: compare public and private rating

peer, client evaluation/rating incorporated into overall rating

According to studies [Shaw, Horton and Chen, CSCW ’11], the social pressure from others can make workers more cautious and motivated about his work quality. We thus inform workers that they will be evaluated by their fellow collaborators and requesters on the same tasks during the work process. The feedback will be provided (with anonymous option) to the workers during the process so they can adjust as they go instead of waiting to be corrected after mistakes have caused damage. Consistent bad feedback, i.e. no improvement after feedback is provided, will harm workers reputation score.

Experiment 4: compare public and private rating

Milestone Contributors

@lucasq