Horton Reputation Paper Analysis - Team Pumas

From crowdresearch
Jump to: navigation, search

Horton Reputation Paper Analysis - Team Pumas


- William Dai

- Nisha K. K.

William Dai Section 1 - Section 5

Nisha K K Section 6 and Section 7

Summary Overview


 - The average public feedback given to the workers/requestors are highly inflated
 - Main reason is the fear of retaliation, threatening or bribery.
 - There is a difference in the public and the private feedback provided, which shows that the feedback given on the platform are less honest. 
 - Experiments were also conducted to study the effect of private feedback feature on getting evaluated/interviewed/hired for a task.


 - the implementation of private feedback feature was helpful in analyzing the issue
 - was able to draw difference in the feedback given publicly and the one genuine one
 - with high private feedback score, workers were getting hired more frequently for a number of tasks.


 - There is no definitive way to prevent workers and requestors from discussing mutual plans to rate each other outside channels regulated by the    platform
 - One conclusion has been that the more truthful ratings are publically available, the higher the likelihood of lies to cover occasional drops in  quality
 - Since the employers are mainly considering the worker based on the feedbacks given to them and their prior experience on the platform, the workers  new to the platform would be disadvantaged.


Feedback on crowdsourcing platform workers has an increasingly positive trend when public, but private evaluations are more typical.

Introduction 1.0

Market reputations influence the actions of requestors in choosing workers, and workers who get jobs have opportunities to strengthen their reputations - leading to long term effects on the marketplace.

Changing definitions of reputations caused by rise of electronic commerce:

The definition of “reputation” evolved from “private beliefs market participants had about each other (Kreps and Wilson, 1982; Greif, 1993)” to “collecting and then showing the average feedback ratings made by prior trading partners (Dellarocas, 2003)” “ to create trust among strangers” over the course of a decade.

Reputation System Designs

Reputation systems are implemented to “reduce risk aversion.”

Problems with Existing Reputation Systems

Bilateral reputation systems respond to feedback with similar feedback, resulting in highly inflated ratings over time. There is no set benchmark for proper reputation ratings, but in a dynamic environment, it is possible to come to the conclusion that the reliability of reputation systems is falling if higher rating averages are observed over a period of time, without a corresponding change in the composition of workers and requestors in the marketplace. This has been observed on oDesk, Elance, and Ebay.

To counter these problems, oDesk introduced a private evaluation system to be used in conjunction with the existing public rating system in mid 2013. This private system found that 20% of people who had a bad experience and stated thus privately still gave the highest accolades publically.

Problems with Private Feedback

New, entry level workers could not get private ratings easily because they lacked prior private ratings. Employers would select workers with favorable private ratings over those without, thus presenting a barrier to new workers finding tasks.

Empirical Content 2.0

Common Market Traits

Markets typically maintain job listings, host user profile pages, arbitrate disputes, certify worker skills and maintain reputation systems.

On oDesk, requestors “write job descriptions, self-categorize the nature of the work and required skills and then post the vacancies to the oDesk website”, and workers are notified via “electronic searches or email notifications” of any vacancies.

Essentially repeats the Introduction in stating that it is more likely for previous workers to find new work.

Status Quo Reputation System on oDesk 3.0

Both workers and requestors evaluate each other, giving written feedback and a quantitative evaluation.

“The employer-on-freelancer quantitative feedback is given on several weighted dimensions— ‘Skills’ (20%), ‘Quality of Work’ (20%), ‘Availability’ (15%), ‘Adherence to Schedule’ (15%), ‘Communication’ (15%) and ‘Cooperation’ (15%).”

“oDesk uses the following ‘double-blind’ process. If both parties leave feedback during the feedback period, then oDesk reveals both sets of feedback simultaneously. If instead, only one party leaves feedback, then oDesk reveals it at the end of the feedback period. Thus, neither party learns its own rating before leaving a rating for the other party.”

“There is nothing to stop parties from engaging in ‘pre-play’ communication about their intentions.”

Dynamics and Current Distribution of Employer-on-worker Feedback Scores 3.1

“In the early years of the platform, the average fluctuated a great deal, as the total number of completed contracts was small. However, over time, the number of feedback scores per month increased and average feedback grew more stable month to month. There is a strong positive trend over time, with the greatest period of growth occurring in 2007. Elance is a similar online labor market, with a similar public reputation system, which merged with oDesk in 2014.”

Caveat: “An upward trend in feedback is only consistent with reputation inflation. It is possible that the pool of workers is getting better over time as poor-performing workers exit.”

Is the Positive Trend in Feedback Score Caused by Worker and Employer Compositional Changes? 3.2

… many equations from others’ research used to compute likelihood of bad workers and hard-to-please requestors leaving the marketplace and leading to accurate, positively increasing reputation scores

“First, a long-held concern with reputation systems is that once a reputation is established, the holder of that reputation has an incentive to ‘defect,’ doing less work or bearing fewer costs since the reputational penalty from doing so is not so great.” “A worker’s current reputation score seemingly affects their behavior. Sellers that know they are about to leave the marketplace seem to provide worse service. While this is slightly different than freeriding on a good reputation, it does suggest that effort is endogenous with respect to feedback score. Second, workers are not exogenously given jobs with set wages and so we might expect that as workers improve or gain experience, they might try more demanding jobs and/or higher-paying jobs in which pleasing the employer is more difficult. “

Current Distribution of Feedback Scores 3.3

This is beyond my understanding, please refer to someone with more knowledge on the subject. Perhaps Neil could explain the mathematics?

An Adverse Selection Model of Reputation 4.0

Through the usage of mathematical simulated equilibrium models, the authors of the paper conclude that “the more common truthful feedback is in the marketplace, the stronger pressure firms face to lie when receiving bad performance”.

Prior Work 5.0

Reputations have always been presented as “scores”, which may then influence the decisions of workers and requestors. Common sources of reputation bias include the tendency for some to not rate, while others with “extreme views” rate more frequently. Fake reviews are common, for a variety of reasons not limited to threats, bribes, and economic conditions. Past work proves that reputation systems are important in influencing the decisions of others. Generally speaking, while a bi-lateral system has flaws, a uni-lateral system is only applicable when both sides (buyer and seller, worker and requestor) have the same concerns at stake. Finally, negative feedback is always costlier than positive feedback, and those who give such feedback risk retaliation.

Collecting Private Feedback

oDesk introduced the concept of providing “private feedback” in addition to the public feedback given by the employers on the same contract to compare the scores within an employer. Private feedback-rather than public feedback-is more predictive of subsequent private and public feedback. Also the textual public feedback given conveys the information given by the private score. When the employer is not planning to continue using the marketplace again, they leave negative feedback to a poorly performing worker. This is less costly since they won’t be using the platform again in the future. Otherwise giving negative feedbacks is costly because there is a risk of retaliation from the person who received the negative feedback. Experiments show that the employers with greater future usage provide high positive public feedback (no risk of retaliation), more negative private feedback, as a result they give less honest feedback.

Experimental Revelation of Aggregated Private Feedback

Experiments were conducted on oDesk where employers posting the jobs were randomized into two groups: Control: Employers get to see the standard applicant characteristics like work experience, skills etc. Treatment: In addition to the information shown in the Control experience, the worker’s private feedback is also shown to the employer. The data received from this experiment was used for analysis which showed that the workers with high aggregate feedback scores are far more likely to be evaluated, interviewed and hired, regardless of whether this information is showed to employers. But the workers with visible low scores for private feedback are evaluated/interviewed/hired less often, while the opposite is true for those workers with high private feedback scores. This shows that employers strongly consider the aggregating private feedback as informative.

But the above mentioned experiment was performed for workers who were eligible. The eligible workers are those who have completed at least 5 tasks on the platform worth more than $50.00 and three distinct employers left private feedback. But the majority of the applicants for the task were not eligible, mostly because they were new to the platform so they had no prior experience and feedbacks to evaluate upon. This was a disadvantage for the non-eligibles. But this effect was not due to the treatment. Not showing the private feedbacks were not the reason for not being hired or evaluated as much as the eligibles. The main reason was the overall low profile.