WinterMilestone 11 Algorithmic HummingBirds
SUMMARY OF WINTER MEETING 11
The goals and discussion this week in all of the three domains focus on the refinement of previous week as we push for our fast approaching deadlines. Let us now track the progress in each of the three domains. However, people helping out in various other aspects of the project are encouraged to jump in and work on tasks that would be of immediate help (i.e., for the quickly approaching UIST deadlines)
1. TASK AUTHORSHIP The goal this week was to try and identify tasks that we can use in our study and transform that into a post that we could post on popular crowd soucing forums to get worker feedback on and reiterate on the design and tweak it to better suit our platform and ensure global worker and requester outreach.
We had to come up with a hard task that would give us the variance we need; We could have an open ended task and base our assumptions on the previous research literature. We thought that the Minnesota and Macalister paper on Arafat Gold might be a right one.
To summarize the paper -
Turkers, Scholars, “Arafat” and “Peace”:Cultural Communities and Algorithmic Gold Standards
in a nutshell, Popular crowd sourcing markets have resort to approaching the dominant mechanism for building “gold standard” datasets in an assumption that it can accurately replicate the judgments of the general population for knowledge-oriented tasks. The paper further explores this assumption only to find that judgments are erroneous and far fetched than people from other communities.
2. TASK RANKING The task feed implementation is quickly progressing as the deadlines are pressing in. A study is going to be piloted by the folks here, this week. We were also working on the timer aspect where we time the time a particular user takes to complete a given task and optimize his future tasks in order to maximize his pay. However he/she is given an edit option. We need to clearly communicate that tweaking the time to be higher or lower than the actual wont benefit them in the long run and that is not how the platform is meant to be used. There maybe a need to incentivize them to be as honest as possible.
Let us consider the best or the simplest case; the worker has a tab opened and he/she is working on it. Let us assume, that he/she takes no breaks in between and finishes the assigned task at one stretch. Then our task is relatively simple, just run the timer straight from start (when the task is accepted) to finish (when the submit button is pressed). The end time is our final time that we are looking for. Now consider the most complicated or worst case which would be a more accurate representation of the real world scenario; The worker has multiple tabs open and he/she is simultaneously working on multiple tasks belonging to different guild structures with breaks in between. If we need to track the time for each it is going to be tedious task. What we could do is extrapolate the earlier model (of the best case) and play around with it to suit the worst case. We could explore versions like auto pause where the task is auto disabled or put on hold automatically if we fail to notice any client side activity for a particular duration of time and we could also, stop the timer automatically if the window or the tab of the task goes out of focus at any point of time.
When a worker joins the platform, we could let him work only on gold standard and prototype tasks within the guild of his/her interest or outside (This is still an option which is open). We can now track the speed at which he/she is working say x tasks/hour (this can be extrapolated to tracks in other types of tasks as well). Now we have the details of the worker, the average speed at which he/she works (per guild or per domain), the number of tasks attempted so far, the duration like how long has he/she been on the platform, what domains have they been interested in, which guild systems are they a part of, whats the success ratio in each of the guilds that they are a part of, etc. If we could base the pay on these factors, maybe it doesn't matter how long they take per task.
Maybe instead of having the time edit, we could just average over the multiple tasks that they do; but in that case we would have to be accurate because a difference of a second or so, could end up being a major difference in the hourly wage.
Its my personal opinion that, its unfair that a difference of a second makes a major difference in their pay because it doesn't show a significantly major difference in their abilities. I think we need to have fixed ranges where people of more or less the same range, have pretty much the same base pay.
3. OPEN GOVERNANCE We were trying to share with the workers an audit involving two steps: a. to generate some scenarios that we thought were pertinent to the way that workers might work on the guild and b. To generate an introduction to our platform to hopefully entice workers on some forums. One difficulty is that there are a lot of edge cases that we don't want to ignore;
Another important question that we need to ask is – Is there a need to pay for peer review? I think, payment may induce some external motivation factor for the workers to take interest in some one else's work. But the payment need not be monetary. We could “pay” them in terms of their reputation rankings and so on.
The current generation of crowd-sourcing platforms are surprisingly flawed, which are often overlooked or sidelined.
So, the idea is to channelize efforts in the direction of guilds and experiment to see to what extent this helps in minimizing some of the issues by building a next generation crowd-sourcing platform, Daemo integrated with a reputation system, Boomerang. The improvised platform is expected to yield better results in limited time bounds (as compared to existing platforms) and definitely, more efficient and representative crowd platforms.
Crowd-source; Daemo; Boomerang; Guilds; reputation; timer
The worker is expected fulfill criteria setup by the guild network and his/her work is reviewed by senior members. We look at prolonged or continuous leveling with exams and critical tasks which would be reviewed directly with no involvement of any 3rd party agencies.
Instead of looking at examination based evaluation for leveling, we could look at their overall record, their reputation in the recent past, their quality rating, position within the guild mechanisms and so on and so forth, which in my opinion might lead to accurate results.
When a worker joins the guild and starts working, over time he/she has a set of tasks (which he/she has successfully completed) to their credentials. They now pay up a small % of what their have earned and this is routed to guild funds. This is used to pay for the peer review evaluations that follow. This is an evaluation “task” in itself (probably happens on a random subset of completed task preferably and not on each task which would be reviewed by the task requester anyway). All the tasks which he/she has reviewed will enter the worker review record and this directly affects the reputation of the worker within the guild i.e., either the worker may jump a few rankings and earn reputation or he/she might lose out a little by falling through the reputation ranking system.
Here, we might take into account the average of peer, senior and requester ratings and these together would affect the worker system. These are not final and binding and the worker holds the rights to challenge such evaluations in order to prevent unfair mechanism of manipulating reputation rankings.
However, when we do consider a random subset we need to be cautious as to not pick up gold standard or prototype tasks for evaluation as that would be based on ground truth and would make little sense for the worker to pay for review of such a task and for peers to evaluate tasks for which the answers are fairly obvious.
There could be automatic threshold barricades that a worker needs to cross in order to get promoted.
We could also have tasks to be reviewed and peer reviewed and then you get promoted based on the ranking and the points and badges recieved. Promotion here would be more like a social decision (they would be paid for it).
Or we could have automatic ranking systems where promotions are also pretty automatic (higher ranked) and would manifest as reputation within the system.
OR we could have third body decisions and a person performing well consistently would be considered for promotion.
This could be considered but we might need to make a collective decision on what we would choose maybe supported with substantial experimentation and data.
Price points will be set manually by the guild initially, but later will be adjusted based on actual mean hourly wages for each guild level which means that pricing is dynamic and based on real market behaviors as opposed to a computationally simplified model.
We could have Forum or Discussion Board with public and private areas where tasks and their ideas would be discussed like a peer doubt clarification session within a guild. However we could have requesters or requester assistants monitoring the session to make sure answers it self or crucial clues to the tasks are not shared. Other peers could flag off a comment or a reply if they feel it is abusive or is giving away answers and so on and if it proves to be so, this could directly affect his/her reputation rankings. We could “clone” a git hub kind of model where a worker or requester can open an “issue” with each task hash tag. It is open for members of that particular guild to join. The visibility of the task and the discussion is limited to the guild. It also has options for private chat between peers and requesters where visibility is limited.
We could have Q&A community with voting (stack overflow model for instance) and this would prevent everyone from spamming the requesters inbox with similar questions. He/She could post a question (regarding the task or the evaluation policy, the pricing etc.) and anyone would be able to see and answer it. The requester could verify and correct answers later. This could also be extended so that workers could receive important updates about the task,changed deadlines if any, so on and so forth.
We could have a Knowledge Base. Assume that a task is based on multiple domains or that it requires deep experiential knowledge to solve it. or lets say a requester wants the workers to refer to a paper or website and then attempt it; It would of course make sense to put it up in the discussion forum but it would become to cumbersome to search for a piece of information since there would a lot of activity; workers would end up missing a crucial part of the information and attempting this task without requisite knowledge would end up in rejection or give rise to unforeseen variance ;Naturally, such a task would be rewarding in nature, not only in terms of knowledge, but also in terms of the monetary prospects. In such a case, a worker would either not have the confidence to raise up to the challenge or he might have the confidence but he/she is unsure of the direction in which he or she needs to proceed; If this really is the case, the task would be open to a very small group of highly skilled workers (probably at the top) within the guild. The requester would have very little data on which his results could be based. So, we could have a separate platform where the requester could chose to guide (if he has the time) a few interested workers on how to proceed, what to read, and so on and so forth.
WHAT DOES THE GUILD NEED?
We could make a discourse system where there is a flair level and a slack kind of an organization with management options and so on.
(The UI and designs will be updated shortly)