Winter Milestone 3 RATH: Evolve Reputation to a Truer Compatibility Metric

From crowdresearch
Jump to: navigation, search

Intended Outcomes of Reputation Systems

  1. Limit risk for the requester by increasing the likelihood of a quality work output
  2. Increase productivity and compensation for workers by obtaining higher paying work

Failings of Current Systems

The challenges with a global reputation system is that requesters and workers are humans and therefore have multifaceted ideas about what is a quality interaction. While it is true that on the tails of the spectrum (very good or very challenging) there is a probability of consensus does this actually help the majority of requesters and workers who are in the middle.

These reputation systems make an aggregate decision about who/what is "good". We live in a world surrounded by recommender systems such as netflix or spotify where we know first hand that there is a difference between "good" and "good for me".


We need to evolve past reputation systems and embrace compatibility metrics to keep pace with evolving socio-technical paradigms.

But Daemo isn't Spotify, a picture is either labeled correctly or it isn't

True, on the most micro of tasks there may be a black/white answer. But what percentage of the Daemo tasks have one right answer? What happens as the size and variety or tasks begin to scale? The more variability in the worker population and the types of tasks, the more ease of access and risk become critical to platform success. The larger the platform becomes the more challenging

Ok fine, it doesn't scale, but we can worry about that later if we need to...can't we?

It is critical to the future iterations of the platform that our infrastructure holds up. What happens to trust in the platform if we are constantly changing our reputation system?

So by compatability do you mean skills matching? I have the necessary verified experience to do xxx task?

In a sense yes, and in another sense, no. The idea is in fact to improve the matching. Our vision is much more complex/sophisticated in reality. For example we would not only include your basic profile and certifications, but also tasks you have done in the past including not only the type of task but your work product where available, the response (feedback) to those tasks - the list goes on. It's almost a vectorization of the workers work as well as a vectorization of the task. Kind of a task2vec :)

Vector...what? There are many ways to actually tackle the building of the complex algorithm. We have an entire Machine Learning team to research and test this out. What we are looking to accomplish with this darkhorse idea is to reframe the conversation.

All right all right. You want to use fancy machine learning and AI. Are we just showing off here? Is there a transactional impact here?

There a many ways that this serves our initial need for ease of access and risk management, of course, but to answer your question let us look at the most fundamental of impacts. In current platforms requesters often have the same task done by multiple workers to mitigate risk of low quality output. What is the right task was getting to the right worker? What if the redundancy was no longer needed? What if now the same price was paid for the task but to only one person, rather than 3 ...or 6? What if the workers feed was full of work they could do well and sprinkled with the work they WANTED to do more of, or grow in to? What if the worker were able to just go straight down HIT after HIT completing work and optimizing their time? What would be the overall increase in hourly wage by not wasting time finding work?

But don't boomerang and the prototype task fix this already?

Boomerang IS actually a compatibility system under the hood. We propose that we take the concept of Boomerang further by embracing its essence and iterating toward a more accurate and productive version. Yes, the prototype task does improve the quality of the HIT. This does not compromise that in any way. Where Boomerang fails is in the broadness of its impact. We propose to take the boomerang system as one data point in a high dimensionality matching problem.

Does this affect pricing at all?

It certainly could. One could pay more for a closer match (low risk, high reliability) or pay a little less for a more broad match. For example, someone newer to design might be a more distant match, but still within the realm of producing good product. It would mean the requester taking on a bit of risk, but it could mean the opportunity to pay a bit less. This will provide individuals the opportunity to grow into new areas as well as offering smaller businesses a more reasonable price point. This isn't necessarily who it would be used - more thought would be required - but it's one way we have pondered so far.


Ultimately we are proposing a conceptual shift away from reputation and towards compatibility for a framing that will stand up to and meet the requirements of reliability at scale for the future of work.