Milestone Reputation Aura

From crowdresearch
Jump to: navigation, search

Design notes for a pro-social crowdwork platform; including a game-like mechanism to generate limited feedback, and a mechanism for moderation.

This design is informed by previous notes here [1] , and is intended to be scalable, while promoting a high sense of fairness, and limiting barriers to productivity.

A pro-social motivation for "Sustainability"

"What does consistently work may be surprising: interventions based not on money, but on leveraging social concerns." - [2]

A simple and concrete effort: Design a system around efficiently "spreading" the opportunity to earn $1/day [3].

A highly visible cause can be adopted as a platform goal to "sustain" dedication and inspire pro-social behavior. A 'community' can be built around a platform with a specific goal, which also offers a marketable, "shareable" message that is synergistically co-branded to an existant community of large-scale efforts and awareness.

The platform will inspire trust by operating with a clear 'purpose' and transparency demonstrating a dedication to expanding and supporting its effort. It can demonstrate a growing, empowered workforce, progressive tools and design to support them, fair systems to assure quality and settle disputes, and fully rewarding dedication.

The $1/day effort I imagine thus: (in cooperation with relevant orgs) Vetted individuals are supplied a connection to our crowd-work platform, initially earning via micro-tasks and others while potentially learning new skills. They may connect with other platform users on team-based macro tasks, potentially learning how to harness teams of the (7 +- 2) type [4]. Once the individual has achieved some amount of effort (time on platform, tasks, earnings, ratings, recommendations, etc.), they are invited to find <3 local individuals and are encouraged to work closely with them, potentially working as a locally-collaborative group in other 7+-2 efforts. "Business models" could be designed and shared among these workers based on businesses that benefit their local population, or generate income based specifically on access to a distributed network. "What could one person do with only access to a workforce of peers?"

In this way, users could 'connect' with an individual to support (see Kiva), could be linked and watch their 'sustaining effort' progress from the moment they join (or choose a 'partner'); watch as their first supported worker incorporates other local workers into the system. Multiple options are present for sustaining the platform and community, for example dedicating 100% of funds earned to 'sustaining' (which are tax deductible if the system is non-profit...), automatically 'sustaining' with a portion of each HIT, or weekly total; or donating "anything above daily goal", etc. There is potential to allow for direct mentoring as well.

Game Overview

This design does not take for granted: skill 'tiers' or ability to 'reject'. It is designed around: promoting community, egalitarianism, sense of fair compensation for effort, and largely machine-driven reputation mechanisms supported by limited subjective human input.

Non-competitive Feedback game (Keep, Give, Sustain)

This game is designed to elicit purely subjective feedback from both 'requesters' and 'workers', and recognizes a unique value in human judgements, while avoiding 'fairness by appearance' or steeper barriers to productivity like designing in attempts to more consistently and transparently objectify this feedback (for example creating task-specific evaluation rubrics, or general guidelines for ratings, which may be subjectively understood and applied).

Variables are built into the design which allow for flexible balance of participation, 'power', and possibly reliability of rating.

An individual's decisions in the "Keep, Give, Sustain" game are not individually visible to the "other player" (that is, a worker's decision needn't be communicated to the requester, and vice-versa). However, a 'career history' may have a role, as explained below.

Building a pot

A 'pot' is generated for each participant in each unique transaction on the platform, which is to say a requester submitting a batch of n HITs generates 2n individual pots, regardless of how many players accept or submit those HITs. (ie. Batch of "n" HITs generates n 'requesterpots' and n 'workerpots', which may be taken up by any number of workers equal to or lesser than n)

The pot represents a specific % rate of the HIT cost, and could be adjusted by HIT type, requester history, temporary incentive, etc. (The platform itself may also provide a 'coffer' to cover unexpected results.)

Each 'player' is provided a separate (not necessarily equal) pot per interaction, and the results of each individual's decision needn't be communicated to the other party.

Sustain

This is the default option, which essentially "re-invests" in the platform. By providing a consistent and 'desirable' option which benefits neither the requester nor the worker, the hope is to concretize value-judgments on a per-user basis allowing us to understand what their subjective decision ratio represents (keep:give:sustain). (For example does a user act randomly, or target the 90th percentile and better of all interactions).

Keep (Worker)

Keep allows a worker or requester to withdraw the money visible to them in the pot, essentially at any time. Primarily a worker will "keep" the pot when:

1) The worker will not complete the task, and "keeps" as compensation for effort already applied. I call this "cashing out". The amount available to 'cash out' may vary on visible effort, milestones, etc.

2) The worker has completed the task and "keeps" as a "on demand reward" and essentially is given the freedom to bonus effort they feel was under-compensated (within the limit of the pot)

3) The worker repeatedly "keeps" in an effort to maximize earnings, ie. scamming.

Keep (Requester)

Similarly, requesters are given a "keep" option, which is available when reviewing worker submitted tasks.

This essentially has one purpose: a "partial refund" mechanism, which may improve a sense of fairness.

It may be also be exploited to "minimize cost"

Give

Give allows either party to "bonus" the other subjectively. Examples:

The entire workforce is bonused by a requester whose task is done ahead of schedule. A requester is bonused by a worker in recognition of "above and beyond" guidance, excellent HIT design, adhering to Dynamo guidelines, etc.

Ramifications

The system could easily adjust the value of the pot available for each option. For example, a user who 'Keeps' in every case might see the 'Keep' option shrink progressively in future rounds, making 'Sustain' and 'Give' more appealing.

In this way, a user is free to reward themselves, to some extent, at a pace that fits them.

The (Keep:Give:Sustain) ratio could be a visible tool for workers to decide whom to work with. For example, some workers may be drawn to requesters with 100% 'Give' rate, while others may favor balance, or a requester who Sustains the system consistently.

Moderation by peers

Moderation is essentially by a rule "would a reasonable worker who would accept and submit this hit submit with this quality?"

Tasks which are 'rejected,' despite opportunity to revise or re-coup some expense via the Keep mechanism, are distributed back to the platform as moderation tasks. These tasks pay a flat rate, but may be increased to correlate with the relative scarcity of 'peers' (fewer widget-master may mean higher incentive to entice them on demand), possibly varying based on the rate of the original task, or other factor.

Moderation is raised when a requester would otherwise refuse to pay. The HIT is simply flagged and the requester supplies a quantifiable assessment of the quality they perceive the flagged work to be at (1-5*, 0-100, a-b... etc)

Moderation tasks display the HIT design and instructions, as well as the disputed result.

Workers are asked to subjectively determine whether the disputed result appears to be Better, Similar, or Worse than the result they anticipate providing given a similar HIT.

In the case of batch-wise HITs, a 'reputation matrix' is built to reflect the 'rep matrices' of the average-submitting-user of the disputed task. The composition, and especially 'specialities' (peak values) of the ave-sub-user, are used to suggest an appropriate 'peer' base to audit the HIT. ie. the reputation should inform us of more traits than specific 'skill tiers' while recognizing special skills that were involved and eliminating perceived status division through tiers. This also means workers of both higher and lower skill are involved in moderation, and the peers available for moderating any task may be 'fuzzier' and more agile.

[In the case of one-off HITs, more reliance is placed on analytical assessment to determine qualities of the task and a peer base that would attempt it.][edit: Relevancy judgements to determine the necessary qualities of peer-auditors might benefit from micro input from workers as another set of tasks [5]

Multiple peer-auditors are allowed to do the auditing task until confidence in the result is developed. As each auditor makes their <,~,> judgement, their skill level at the task-in-dispute, as estimated by the reputation system, informs the system of a fuzzy data point suggesting a range of peer-assessed quality, which accumulate until a point of confidence in the quality of the submission as estimated by peers.

The peer-estimate of the moderated HIT is compared against the average user submission, and a decision of "pay/don't pay" is made based on an over-under comparison of the peer-estimate of the moderated HIT vs the average user submission. The uncertainty in calculation is "benefit of the doubt" and a mechanism to approximate fair leniency by paying both parties if needed. (poor explanation, hope to clarify at a better time). The requester's evaluation could have a role in determining the size of the 'benefit of the doubt' range.

Over multiple moderations, auditors may be compared repeatedly against the evaluations of the auditing group to determine a "trust" or "auditing skill" factor which would inform the weight given to each auditor's assessment in the group; this factor could also adjust the pay rate offered to potential auditors. Inaccurate raters lose monetary incentive.

In this way, many are able to audit, it may seem egalitarian, and the system has a mechanism which can inform a requester 'Your expectations were unreasonable! Your design was poor' or a worker 'You didn't understand what's expected at this level!' etc.