Winter Milestone 5 westcoastsfcr

From crowdresearch
Jump to: navigation, search

Please use the following template to write up your introduction section this week.

System (for task feed and open gov write up)

Brief introduction of the system

Turkxam is a method to to enable the crowdworkers to prove their skills using placement exams.

How is the system solving critical problems

In recent years crowdsourcing has become common means for businesses to accomplish mundane and simplistic tasks that cannot be accomplished by computers. These tasks are known as “microtasks”, and typically require inherent knowledge that is hard to teach computers. An example would be editing a paper to make sure there no spelling or grammatical errors, as well shortening wordy sentences while still conveying originally intended meaning. Crowdsourcing has also made a possible to collect large amounts of data in a short amount of given. A researcher can upload a survey up to a crowdsourcing site, such as mechanical turk, pay each person who takes the survey a not so handsome fee and call it a day. These not so handsome fees can be as low as penny. It would make sense for these low forms of payment to be specific to certain tasks on these crowdsourcing sites, but it isn’t. Nearly all tasks pay a low fee and there is little variation among the payment for different tasks. Now while these tasks are mundane and simplistic it makes sense that the requester thinks paying the worker such a small fee is sufficient, but these tasks take time. And the only way to eventually get paid more, is to do an excellent job on a vast amount of low paying tasks to eventually be able to gain access to higher paying tasks. Under the assumption that these workers have no skills whatsoever it makes logical sense that the workers must prove themselves to eventually receive higher pay. Afterall this is how most entry level jobs functions. You get hired when there are very minimal requirements if any, you gain experience doing this same job over and again, and eventually you get a promotion or raise. But to make the assumption that all workers possess so few skills and such low intelligence that they need to perfect their ability to perform hits before getting paid higher is absurd.


Many workers are capable of various skills, but how exactly can trust be instilled the requester that those working on their task are capable of producing acceptable work? As I mentioned above there is currently no way for workers to move up besides completing a vast amount of tasks that are high quality.

Module preview

Existing crowdsourced techniques have no method to assess the skill of the worker in an accurate manner. Workers also do not have sufficient time to prove it to the requestors. Hence we propose the use of placement exams to help the workers prove their skills, find the tasks that are relevant to them, and maximize their pay in minimum amount of time.

System details

How these placement exams would work is that when someone signs up as a worker on Daemo they would select from a list of categories what kind of tasks they’re interested in doing. They would then be presented with one placement exam for each category they marked. Based on their performance on the exam it would determine what tasks in these topics they have access to. Each worker for each category would be labeled a level from 1-10 that would determine what tasks they can do. For example, say they they marked python programming as one of their interests and then scored 100% on the exam they would then be labeled a level 10 python programmer. But if someone scored 30% on the placement exam they would be labeled a level 3 python programmer. People could later take more specific exams if they wished to level up from a level 3 python programmer to a level 4. There would need to be precautions taken to make sure that no cheating was to take place. The main issue is figuring out which method would be most effective.

System Two(for task feed and open gov write up)

Brief introduction of the system

Boomerank is a system that aims to maximize workers' income by creating a feeling of trust and confidence with the requester that they will be receiving high quality work. This bond is created by giving workers early access to tasks that best utilize their skills and fir their performance.

How is the system solving critical problems

Within Daemo the main system that has been orchestrated and deemed most valuable. Boomerang makes it so that worker and requester ratings directly affect themselves and aren't just a mere act of politeness within the crowdsourcing world. If a worker rates a requester high they will be seeing tasks from that worker in the near future as long as the requester rated them high and wants them to work on their tasks again. While this has solved many issues within crowdsourcing, such as the yelp issue, it doesn't maximize workers' pay. What it does do is give workers and requesters the incentive to only give good reviews to people they wish to interact with again (if only the real world worked like this). Now what if we had a boomerang that utilized a worker's skill in a way that maximized their pay and improved the way quality of work that was returned to requesters. Let me introduce to you a system known as Boomerank. Boomerank provides workers with priority access to HITs based on which categories they have been most proficient in completing in the past.

This would require the implementation of a placement exam, while still allowing users to take level up exams. When someone signs up for Daemo they will receive an email informing them of when their placement exam will be issued as well as the topics it will cover. These topics will be made up of the categories of tasks which exist at that time within Daemo. This will give the worker an allotted amount of time to study for these tasks, while being informed on what the subject matter will be. This in turn gives the worker a sense of control in determining their "career path" while working on Daemo. Newly registered worker will then take their exam on the assigned date; these exams will be given in clusters to help avoid leaked information. After the worker has taken the exam they will be allowed to choose their "fields of interest". These fields of interests will be categories of tasks they believe they would find interesting. This information will be used to inform them of when upcoming "level up" exams are taking place.

A "level up exam" is an exam that allows someone to go from say a level 4 python programmer, to a level 5 python programmer based on how well they score on the exam. Levels can never be skipped in order to ensure that the worker thoroughly understands their "field of interest" and isn't just mastering certain concepts. This is vital to how we allow workers to access work since a level 5 python programmer can access all tasks that require the levels equal or less to them. So if arrays must be mastered for level 4, but they skip from 3 to 5 and take a level 4 HIT it would cause an inconsistency in our experience model. Our model is allowing workers to take tasks based on prior experience without feeling the need to prove themselves by taking thousands of HITs first.

Now that worker has selected their fields of interest they must wait for their exam to be processed. This placement exam helps determine, which tasks will be best suited for the worker. For example, say they choose CS as their field of interest, but perform really well in another field on the placement exam these HITs will pop up first. This allows workers to utilize their skill sets to maximize their pay. As they continue taking HITs, whichever tasks they are most proficient in will be the ones that they gain early access to. This addresses the issue of maximizing workers' pay, as well as maximizing the quality of work requesters receive. Since workers will first be issued the tasks they perform best at it this will give them an incentive to take these hits instead of waiting around for their "field of interest" to show up. This will form a sense of trust and confidence in workers that requesters have not experienced before. Due to the fact that the majority of people doing requesters' HITs will have performed proficiently on these tasks in the past, they will feel secure in maximizing a worker pay. Higher quality work = higher pay.

Boomerank would rely on a metrics system that primarily uses 1) How fast a task was completed 2) The quality of work the requester received.

Now while it may seem that this system is trapping workers into only working on tasks they are proficient at, that is not the case. Workers will be still be given the opportunity to take "level up exams" for any field, but will most frequently informed of upcoming "level up exams" that fall into their "field(s) of interest". After taking their "level up exam" test work will be administered to them to see how proficient they are in their newly claimed skill set.

The whole point of Boomerank is to allow requesters to utilize workers to the best of their ability, while workers utilize their skill set. As it has been shown, people tend to choose career paths that don't necessarily utilize their strengths, but instead maximize their pay ( Now while, crowdsourcing it is not the same as a "real job" the same psychological tendencies should still follow.


Requesters may still refuse to maximize payment regardless of what quality of work they receive. A worker could accidentally perform well on a placement exam, thus at first getting HITs that really don't suit them.

Module preview

Current crowdsourcing platforms have yet to utilize workers based on their skill set. Providing workers with tasks that will ensure they provide the best work they can, while maximizing their pay is essential in a platform such as this. Workers need a way to maximize their pay and requesters need higher quality work.

System details

This system will not only utilize placement exams, but the metric of workers' performance along the way as well. While someone may only want to do tasks of a certain category, it is more of an advantage to them if they end up doing the tasks they are best suited for. This maximizes pay and minimizes time spent. In short what Boomerank aims to do is add to boomerang an element that wasn't already there. In boomerang it lets you see the requesters tasks again, based on how the requester and worker rated each other. But what if that requester is done posting any tasks you would ever be able to do. Say you completed a design HIT and did extremely well, but now it's the phase where he needs everything hard coded for the website. Now his HITs for coding are popping up, but you want design. You did an excellent job on design, but because you're only getting priority access to this requesters HITs you can no longer utilize this skill. This will use metrics to determine which tasks the worker is best suited for they are given priority to the work they are actually proficient at, and not just people they enjoyed working with. If a worker takes a level up exam, they will presented with a few test tasks to see how proficient they are and whether or not they should be given priority access to these tasks.

Task Authorship Method Section

Study Introduction

We begin by comparing the variance of workers responses between two different conditions of task authorship in order to find out whether worker's quality of responses depend upon the requester's quality of task authorship. The two conditions we will analyze are if all of the requesters are given templates versus if they are not given templates in which to author tasks. After describing the experimental design, designed to indicate in which condition the worker's responses are more homogeneous, we show how different workers responses vary across both conditions. Previous studies have used a similar method to study the laziness of workers, however this study will focus on the quality of work posted by requesters.

Study Method

This study's experiments reported in this paper were conducted using Mechanical Turk, a crowdsourcing Internet marketplace that outsources crowd work to workers. In this system, workers choose from task listings posted by requesters and requesters post tasks they need completed.

Experimental Design

This study presents workers with similar tasks under two different conditions imposed upon the requesters. Requesters will be given five different tasks to post. The tasks they are given will not provide any instructions on how to format the task. One group of requesters will be given a set of templates to use and the other group will not be given templates. The tasks given to both groups of requesters will be the same. Then there will be two groups of workers. Each group of workers will consist of workers with varying levels of experience with Mechanical Turk and similar sites. Also both groups of workers will contain enough workers for each task to be completed by multiple individuals. Each group of workers will complete tasks from one of the requester groups. So one group of workers will only complete tasks from the group of requesters without a template and the other group of workers will only complete tasks from the group of requesters with a template. The results of the workers will then be analyzed. Specifically we will look to see the consistency in the work produced from tasks posted with and without templates.


For this study we will compare the work produced for each task under both conditions, tasks posted with a template and tasks posted without a template. Each task between each condition will be completed multiple times by different works with varying levels of familiarity with Mechanical Turk and similar sites. This way we can see how answers vary both between each condition and between workers knowledge and familiarity with Mechanical Turk. Since each task was completed by multiple workers in each condition, we can analyze the difference in responses under each condition. We would expect less of a variance in response to questions posted with a template than those posted without a template.


We want to analyze the difference in task results between the two conditions. Specifically we want to analyze the difference between requester's answers in each condition to see if there is less variance in task answers for the workers answer tasks that were using templates versus those who were not using templates. If there is a larger variance in answers between a particular group, we can see whether using a template or not made the variance in worker response decrease or not. And we could compare this variance to how different the responses were for the worker group's tasks who were answering questions of tasks without a template. If there was a significant difference in each group's variance in answers, then we could prove whether or not requester's quality influences worker quality.