Milestone 4 Singularity
We here mostly elaborate on our idea from previous week, which can be found here - []
- All the ideas presented in the examples are similar at their core. They all tried to solve the resolution problem with some kind of moderating system with same idea at the core.
- All the ideas pick up with the same model of having a moderator which can be a person or some learning system who will resolve the matter with some further addition of ideas. For example, one of the idea mentions where moderator reviews the rejected hit submitted to them by workers and moderator than checks if the rejection was a valid rejection or not.
- Every idea presented has some disadvantages which are not mentioned. For example two of the ideas mention about a person being a moderator. The problem with this system is that he might have to do a lot of work. He might get review requests which might be valid or invalid. And if the system has huge number of users and a lot of review requests are coming than handling it manually becomes really difficult. For his problem we can have some learning model which can handle review requests on the basis of some pre-defined criteria set by requester during task addition. Another system mentions account blocking of requester until he settles the issue with the worker. The main problem here can occur is, what if worker never agrees and as system doesn't have any human moderator, requester's account might be blocked for a long time. Such system will discourage requesters from using it.
- Though all the ideas are similar at the core but their implementation/methods of resolution are different.
- One of the idea suggests a Priority System in which resolution cases of requester/worker with bad reputation are pushed to low priority while one of the other idea suggests using a human moderator who reviews the resolution cases.
- Other idea presents a nice solution of "request direct call to the requester" by the worker and blocking requester's account till the matter is resolved.
The axes we can think of varying designs across are :
1) Mode of selection of moderators : crowd-controlled or Amazon(centrally) controlled? Moderators could be elected by workers and requestors, or be appointed by Amazon. The latter model is not scalable as it would require a lot of work from Amazon staff. But there are multiple hybrids possible in between : MTurk staff could remove mods after enough complaints, could function as a secondary layer of ratifying moderator decisions, and could decide how the moderators get paid.
2) Bias The outcome of every individual dispute is zero-sum. A worker's loss is a requester's gain. Therefore in this case the worker's and requester's goals conflict. Although as a group altruistic workers and requesters both want fair judgements, they may disagree on a the level of lenience shown workers, etc.
So worker and requester interests are in conflict when moderators are being elected. Who should be allowed to vote, and how much weight should each vote carry? How strict should the eligibility criteria for suffrage, for both groups, be? Selection of the electorate ultimately influences selection of the elected.
3) Value of a vote in a moderator election system One extreme is that every vote has equal value. But does everyone have equal stake in the system? Ideally, one's decision making power should be proportional to their stake in and contribution to the system.
Should worker and requester votes carry equal weightage? The average requester handles much more money than the average worker. Should the weight of a vote be proportional to the amount of money earned/paid by the voter? This metric is a strong indicator of how much they have contributed to the system and how much they have used it. It serves as a somewhat weaker indicator of their future stake in the system. A further refinement is to only consider money earned/paid in the last month/year, or to weight previous payouts accordingly.
4) Penalties for faulty rejection/frivolous dispute
Should the penalty for a frivolous dispute be zero, every rejected worker would raise one on the off chance that they get a favourable ruling. This would overload our moderators.
Taking an idea from , a worker can pay a moderator for reviewing a rejected task. They stand to lose this fee if the moderator rules their submission unsatisfactory and the requester can be made to pay this if a favourable ruling is obtained.
Task clarity: How might workers+requesters work together help produce higher-quality task descriptions?
1. New tasks go to workers first who improve the task before it goes live . 2. Voting on hit design
3. New tasks go into a holding pattern where they need to be "voted in” by workers
4. Task templates (can we do better than AMT’s templates?)
5. Stronger categorization of work
6. Artificial turker that has to understand tasks before they go live
0. The biggest similarity among most of the ideas pertaining to task clarity was the "Improvement of the task design". This could be accomplished through several ways and many crowdresearches came up with similar ideas of how to go about it. Almost everyone unanimously agreed that task design was vital for a task to be clear to the worker. A user-friendly design will ensure that the worker doesn't have to spend too much time trying to slog through a poory-designed task trying to figure out what the task wants him/her to do.
1. One major similarity that seemed quite evident in all these ideas was the inclusion of the worker as an entity that could participate in the improvement of the design of the task. This could be done by i. Letting the task go to a small set of workers who improve the task before it's established and is in it's quarantine period. ii. Letting the workers assigned to a particular task rate on various aspects of the task (either after having completed the task or when the task is in some sort of quarantine period) iii.Letting the workes rate on various aspects of the task `before` they go live instead of after workers complete the task. iv. Letting an artifical turker(worker or requester) give "indirect feedback". This feedback is obtained by viewing the convenience with which the artificial turker understood the task.
2. Another major similarity or recurring theme among the ideas listed pertaining to task clarity was "Standardization". Our team genuinely believes that standardization could go a long way in improving task clarity and making it convenient for the worker to get a hold of the task as quickly as possible. This was observed in the listed ideas in the following ways: i. Some of the teams called for stronger categorization of work. This is a standardization measure because categories exist as a standard that a task can fit into or conform to. A task generated by a requester has to conform to these categories which act as a standard that the workers can use to search for tasks and find them conveniently. ii. Another recurring theme in some of the team's ideas were "Task templates". This was also reccomended in the paper "The Need for Standardization in Crowdsourcing" by Panagiotis G. Ipeirotis and John J. Horton who suggested that ensuring standards such as templates would improve productivity as the workers would not have to spend time going through a new design every time. A worker who is familiar with a template would know exactly where to look for what information which would cut down on the time spent grasping the task. Every software developer believes that an easy-to-use, user-friendly UI is absolutely vital for an efficient user-interactive platform.
1.One of the most primary differences between ideas seemed to be that most of the ideas could be clubbed into two categories. A huge subset of the ideas focused on improving task design by community interventions. The remaining primarily called for some sort of standardization. These two ideas combined could supplement each other and go a long way in improving task clarity.
2. Among the set of ideas that endorsed improving task design, the primary differerence was in `HOW` the task design is improved. Some ideas suggested that this could be done by voting, some of them suggested that they could by done by letting workers rate on various aspects, others suggested that a task cuold be put into quarantine where a set of crowdworkers improve it before outting it out on the crowdsourcing platform. The method of improving task design seemd to vary from idea to idea.
3. Another major difference betweeen ideas that endorsed standardization was the standard that the idea set out to implement. Some ideas suggested that categorization could be used for better standardization and others suggested that task-templates could be crafted and designed. One positive observation made by our team was that these differences could easily by incorporated WITH each other. One could easily implement a platform that provides for BOTH categorization of tasks and also has provision for improved task-design-templates.
1. Well one axes that one could derive would be the scale of the number of people involved in improving the task. Some ideas vaguely suggested that a small set of people would be involved but we could scale it up or down. 1. Last week our team came up with the idea of micro-moderators or low-level moderators. Here only 1 person is in charge of task clarity. We suggest that for each task that a requester puts up, he chooses 1 particular worker (he can do so be looking at that worker's reputation and give this responsibility with workers of high reputation) who is responsible for improving task clarity in return for a higher payment. 2. This micro-moderator can be assigned a number of duties such as: helping workers who ask for help in understanding the task, improving the task design, suggesting a template to the requester, improving the language of the task description, overlooking other workers and ensuring that they understand the task by making him/herself available to them as a source of help. 3. This micro-moderator could chat with workers who request for help and ensure that they're not doing the task wrong. 4. The micro-moderator's primary responsibility could by to improve the task design, ensuring it falls into an appropriate category and is assigned to the set of people whose interests and skills align with this type of task.
2. Another axes that we could derive from the ideas could be the scale of standardization. One of the most recurring tasks in crowdsorucing is to fill out surveys or use multiple choice questions that pertain to a common theme. One could easily standardize such tasks by strong categorization into niche subcategories
- Most of the ideas presented in the given example seem to draw inspiration from the fact that human beings seem to react favourably to positive reinforcement. All ideas have somehow tried to infuse the concepts of reputation and reward in order to design a system that rewards the user for good behaviour.
- A lot of the ideas presented are based on a video game model. One of the ideas even mentions one of their influences to be the popular racing game Need For Speed : Hot Pursuit. The idea of workers levelling up and gaining reputation points, earning achievements for good work and forming cliques and groups seems to be based on more of an MMO structure in order to incorporate a sense of reward in the worker for doing something good, which in turn is what contributes to making the system enjoyable and addictive.
- One of the disadvantages of most of the models presented seems to be the fact that a worker with a higher level seems to have power over the novices. This adds a layer of challenge for the newcomers to the system.
- Most of ideas presume that a high level worker will be turking as a full time job. Thus, they will be ready to put more effort in and produce high quality work. It would make sense to give them more powers and better work.
Broadly, most of the ideas seem to have a common base, and they build upon that base to expand in different directions. At their very core, most of the ideas advocate the same thing, but the differences start showing in the nitty gritties and the implementation details of the ideas. These can be broadly summed up in the following points:
- The methodology used in different methods is different. While one system allows workers to rate themselves on a scale of 1 to 10, another rates them on their experience, while yet another rates them based on their skill-set.
- Only one system goes into the specifics of requester ratings.
- One system talks about elevating workers to moderators who help in vetting of tasks of other workers. They also talk about adding a social element in which requesters add workers with whose work they are satisfied ,to their circle.