Summer Milestone 9 Reputation Systems research and exploration
- 1 Evaluated Papers
- 1.1 Reputation Inflation: Evidence from an Online Labor Market 
- 1.2 Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text 
- 1.3 Liquidity in Credit Networks: A Little Trust Goes a Long Way 
- 1.4 Inefficient Hiring in Entry-Level Labor Markets 
- 1.5 Opinion Mining Using Econometrics: A Case Study on Reputation Systems 
- 1.6 Rating Friends Without Making Enemies 
- 1.7 I rate you. You rate me. Should we do so publicly? 
- 1.8 Reputation Transferability in Online Labor Markets 
- 1.9 A System for Scalable and Reliable Technical-Skill Testing in Online Labor Markets 
- 1.10 The Utility of Skills in Online Labor Markets 
- 1.11 On Assigning Implicit Reputation Scores in an Online Labor Marketplace 
- 1.12 Strategic Formation of Credit Networks 
- 1.13 Teapot - Trust Network (Stanford U. research project) 
- 1.14 Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System 
- 1.15 Have you Done Anything Like That? Predicting Performance Using Inter-category Reputation 
- 1.16 Content-Drive Reputation for Collaborative Systems 
- 2 Evaluated Platforms
Papers related to reputation systems and a summary of the pros and cons of the implementation.
Note: Professor Michael Bernstein gives the following advice:
"Reading papers efficiently is an important skill. First, figure out what the main research question is that’s being asked. That’s usually in the Abstract. Then read the intro to make sure you understand the high level approach they’re taking. Skim the related work to make sure you understand the general space they’re working in, then pop over to the conclusion to make sure you understand what they think they did. Only then do you dig into the meat of the paper. On your first read, don’t try to understand every detail. Aim for a general understanding. We can always come back and pick apart the details later."
Reputation Inflation: Evidence from an Online Labor Market 
William Dai and Nisha K.K. (Team Pumas)
- The average public feedback given to the workers/requesters are highly inflated
- Main reason is the fear of retaliation, threatening or bribery.
- There is a difference in the public and the private feedback provided, which shows that the feedbacks given on the platform are less honest.
- Experiments were also conducted to study the effect of private feedback feature on getting evaluated/interviewed/hired for a task.
- the implementation of private feedback feature was helpful in analysing the issue
- was able to draw difference in the feedback given publicly and the one genuine one
- with high private feedback score, workers were getting hired more frequently for a number of tasks.
- There is no definitive way to prevent workers and requestors from discussing mutual plans to rate each other outside channels regulated by the platform
- One conclusion has been that the more truthful ratings are publically available, the higher the likelihood of lies to cover occasional drops in quality
- Since the employers are mainly considering the worker based on the feedbacks given to them and their prior experience on the platform, the workers new to the platform would be disadvantaged.
Hidden Factors and Hidden Topics: Understanding Rating Dimensions with Review Text 
Juechi and @alfonsoxw
Liquidity in Credit Networks: A Little Trust Goes a Long Way 
Inefficient Hiring in Entry-Level Labor Markets 
Opinion Mining Using Econometrics: A Case Study on Reputation Systems 
Rating Friends Without Making Enemies 
I rate you. You rate me. Should we do so publicly? 
Reputation Transferability in Online Labor Markets 
Claudia Flores Saviaga (@claudiasaviaga)
Summary: In online market places such as oDesk, AMTurk and TaskRabbit, employers post tasks on which contractors work and deliver the product of their work online. In this places reputations systems play a important role in instilling trust and often are used by employers to predict the future performance of the worker. However, the tasks available in such places span across a variety of different categories, which leaves the employer with the issue of trying to guess how this reputation, in different task categories, is mapped to the category at hand. This paper analyzes how past, task specific reputation can be used to predict future performance on different types of task.
The paper explores the following questions:
1) Are reputations transferable across categories and predictive of future performance?
2) How can we estimate task affinity and use past information to best estimate expectations of future performance?
For answering the questions the authors use a set of real transactional oDesk data consisting of over a million real transactions across six different categories from the oDesk marketplace (Software Development, Web Development, Design and Multimedia, Writing, Administration, and Sales and Marketing). The data was collected between September 1 and September 21 of 2012.
It is important to remember that in the oDesk platform after a user completes the task, the employer supplies feedback scores integers between 0 and 5 in the following six fields:
The average of these scores divided by five represents the final rating.
The authors analyze the information using a binomial model and a multinomial model from the assumption that the latent qualities of the workers are static, in the sense that they assign equal weights to past ratings. Then they use a linear dynamical system (LDS), to take into account that as users complete more and more tasks, their more recent tasks are more predictive than their initial and older completed tasks. As expected due to its simplicity, the binomial model performs worse than the multinomial model, which in turn performs worse than the LDS. The paper concludes that reputation systems can be improved if the feedback scores of the participating users are adjusted to take into account the type of task that a worker is expected to complete (or has completed), as well as the user's past category-specific performance history.
Pros:The paper shows a clear approach for analyzing the correlations between different tasks categories and as a result,provide a more accurate estimate of a worker's performance in a new category. This can allow employers to make safer and better informed decisions about which workers to hire. The authors also suggest that their approach can also be used to recommend workers to apply for tasks that are seemingly out of their scope but for which these contractors are highly likely to provide successful outcomes.
Cons: It does not address the problem that new workers have when they first join this online markets and have no reputation at all.
A System for Scalable and Reliable Technical-Skill Testing in Online Labor Markets 
The Utility of Skills in Online Labor Markets 
On Assigning Implicit Reputation Scores in an Online Labor Marketplace 
Strategic Formation of Credit Networks 
Teapot - Trust Network (Stanford U. research project) 
Teapot : "transactions and exchanges along paths of trust" This Platform was developed to enable trusted interactions on the web, as a Stanford research Project. Every aspect of transaction and communication involves trust, it is an important factor while buying and selling a product or assigning a task and getting the right result. The question is how can one analyse the trust factor existing between people .Teapot is a trust network which decides who to trust and who not to by analyzing the online interactions and social network of the users . Teapot finds a "Trust Path" between the users and establishes a trust network which determines how much a user is likely to trust the user on the far end. It captures the idea of transitivity of trust wherein it determines the level of connectivity that exists between the two users on the basis of mutual friends. This also determines the trust score. Having a "shared background" enables stronger trust relations because one is likely to trust someone having the same background Any references made by the users for building stronger trust relations are also taken into account by this platform.
- Everyone likes to trust someone they know, it works like a powerful recommendation system. Teapot uses this feature of connectivity and trust existing in relationships to make user's transaction and experience better.
- In a trust network, one would not likely indulge in fraudulent activities as it would disrupt all his/her connections. So, this eliminates unwanted users and online fraudulent practices.
- Teapot solves the cold-start problem by making trust portable across marketplaces. (paper ref.)
[Also, Teapot provides access to its reputation system through simple, easy-to-integrate web-based APIs.]
- Transactions based on such a trust network reduces anxiety and boosts a larger trust network.
- There is a possibility that weaker intermediate relations are found between the users which is not enough to establish healthy trust relations.
- Social Network: Not everyone is on a social networking site (like Facebook). The platform takes into consideration the social circle existing on Facebook only, which eliminates the possibility of more existing connections .
Trust Among Strangers in Internet Transactions: Empirical Analysis of eBay’s Reputation System 
Aditi Nath + Ankita Shastry
Have you Done Anything Like That? Predicting Performance Using Inter-category Reputation 
- In the current system, the reputation for a worker when he is exploring a new category is determined by uniformly averaging all of the previous categories that the worker has been involved in.
- The researchers believe this method is inherently flawed because a worker may have completed tasks in categories that have no relevance to the new category.
- For example, the researchers believe makes no sense that reputation based on a completed task in Language translation and reputation based on a completed task in website design have the same weight when a worker attempts to complete his first task in mobile application development.
- To solve this, the researchers created a new model that assigns weighting to previous tasks based on their relevance to the new category.
- There is evidence of an increase in accuracy from the baseline (status quo methods) using their model, both on a random scale and on a real-world application scale.
- This works with both sparse data models and even when there are dense data models. This will make sure that this approach will always work, regardless of data points.
- Filters out information that may be irrelevant to the task at hand.
- Can only be used to modify rating scores shown to employers. Essentially may be a limited approach for our diverse website.
- May filter out some important information as well. .
- If a worker has never completed a task in a relevant category before, he/she will be disincentivized from completing a new task since there is a low chance that he/she will be hired. A decrease in worker freedom in this way could push people away from our interface
Expansion on Con 2: Although categories like language translation may not have any sort of relevance to a programming task in terms of expertise, there are some other skills, like dedication, effective communication, and being a trustworthy worker. In this setup, that information does not factor over as it is part of a seemingly irrelevant category.
Content-Drive Reputation for Collaborative Systems 
- Main goal to judge users by their actions, rather than by the word of other users. Users gain or lose reputation according to how their contributions fare; users whose work is undone lose reputation.
- In content driven reputation systems every user is turned into an active evaluator of other users’ work by the simple act of contributing to the system. Furthermore, a content driven reputation system is resistant to bad mouthing and attacks.
- The authors' main contribution is the idea and formulation of a pseudometric which automatically judges users' reputation. They demostrate a pseudometric on document versions, specifically document versions of wikipedia files and git repositories. Pseudometrics satisfy two requirements:
- Outcome preserving: If two versions look similar to users, the pseudometric should consider them close. Assigns a distance of 0 to versions that look identical
- Effort preserving: If a user can transform one version of a document into another via a simple transformation, the pseudometric should consider the two versions close.
- Overcomes known issues with user generated ratings which are:
- User ratings can be quite sparse
- Gathering the feedback and the ratings require secondary systems outside of main system
- Their pseudometric was mathematically proven to be resistant to known types of attacks (Sock Puppet accounts)
- Reputation is preserved in accordance to a set timeline
- No field data on running this reputation system
- Two main requirements for such a system:
- Ability to embed document versions in a metric space, so that distance is both effort-preserving and outcome-preserving.
- Presence of patrolling mechanisms that ensure that the system does not have “dark corners”
- While the pseudometric is shown to work well within document space as well as GIT hub contributions, a general model is a hard problem as defining a measurable contribution in more complex collaborative systems (They cite SolidWorks) would need to be set in stone.
Identify what is working and what is failing on current crowdsourcing platforms. Find inequality between how requesters and workers are treated by the system. Please feel free to sign up to review as many of these platforms as you'd like.
Amazon Mechanical Turk