Difference between revisions of "Milestone 2 ZSpace"
(→Requester perspective: Crowdsourcing User Studies with Mechanical Turk)
m (→Worker perspective: Being a Turker)
|Line 21:||Line 21:|
=== Worker perspective: Being a Turker ===
=== Worker perspective: Being a Turker ===
Crowd sourcing has aptly been
Crowd sourcing has aptly been as '''"the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call"''' by Jeff Howe. The attempt made by the researchers is to represent the concerns of this network of people and their relationship with the requester and with each other.
==== Statistics about Turkers ====
==== Statistics about Turkers ====
Latest revision as of 06:10, 12 March 2015
- 1 Attend a Panel to Hear from Workers and Requesters
- 2 Reading Others' Insights
- 2.1 Worker perspective: Being a Turker
- 2.2 Worker perspective: Turkopticon
- 2.3 Requester perspective: Crowdsourcing User Studies with Mechanical Turk
- 2.4 Requester perspective: The Need for Standardization in Crowdsourcing
- 2.5 Both perspectives: A Plea to Amazon: Fix Mechanical Turk
- 2.6 Turkit: Algorithmic human computation using imperative programming style
- 3 Browsing Blogs
- 4 Need synthesis
Attend a Panel to Hear from Workers and Requesters
The following realizations and revelations were brought to light in the Panel that happened in the morning:
- For a full-time Turker, finding the right HIT itself is a full-time work. There are a lot of tools, scripts that have come to a workers aid. Forums like MTurklist,crowd worker, crowd alert. There are tools like hitscraper listed all the HITS that appear real-time and meet a certain criteria. The full-time Turkers go much beyond casual browsing for HITs, to the extent of having multiple screens to be able to track all the HITs.
- For most of the Turkers the basic incentive to Turk is money. But the community there allows Turkers to stick around and help others, share HITs, provide support. "You become a part of a community. The ability to have people who understand what you are going through allows you to stick on with Turking"
- The requester-worker relationship is much beyond task completion. To contact requester, email them. The subject can be as varied as - "Why was I rejected, your request is unclear, you are paying less". Further, Turkers suggest ways to frame a better request. Requesters are welcome by the Turkers to forums and they have conversations with them. Turkers help the requester from start to finish, even with handling APIs and everything because Amazon doesn't have that detailed support.
- The requesters are well-connected. It is like a facebook group, where they talk about almost everything under sun: their highs and lows.. how to be more efficient..drama and issues.. It is literally like any other forum.
- The steep learning curve of AMT makes it the biggest hurdle for a new Turker to stick on. A newcomer simply cant imagine making money using the platform. One needs to know working with scripts. Drop off rate on MTurk is severe due to lack of support and the new platform should provide support service and documentation, things that retain good workers. Horrible interface. MTurk doesn't seem to care about it.
- For people not from USA, finding jobs is tougher and they (Indians) have to settle for really underpaying jobs like transcribing receipts. Ultimately they are frustrated and they quit. Since 2012 an Indian cant get a new MTurk account.
From a Requesters point of view:
- As much as a Turker wants the Requester to be active on forum, a requester prefers to be communicated via Email. They are preoccupied with their own research or job.
- The one restriction that may bug the requester is its requirement that you cant ask the Turker to download anything (software) for security purpose against malwares. It is not doing much it is affecting the good actors because the bad ones are going to do it anyway.
Reading Others' Insights
Worker perspective: Being a Turker
Crowd sourcing has aptly been defined as "the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call" by Jeff Howe. The attempt made by the researchers is to represent the concerns of this network of people and their relationship with the requester and with each other.
Statistics about Turkers
The following statistical information has been collected by the researchers:
- 80% of the work on AMT (Amazon Mechanical Turk) is done by the most active 20% of the Turkers.
- 56% of Turkers are US nationals and another 36% are Indians.
- Primary motive is money. Though low wages on AMT tell a paradoxical tale.
- Unfair Rejection of Work - The instances when the Turkers feel that their HITs are unfairly rejected are highly frustrating, as it not only deprived them of the earning they should have made for spending the time and effort on task but also it spoils their reputation and a future chance of work.
- Slow Payment - Many Turkers turk for fast money and slow payment tend to defeat the purpose.
- Low Wage - Few HITs end up paying less than the official minimum wage.
- Poorly Designed tasks by the Requester - This results in misunderstanding the task and rejects that could have been avoided.
It has been felt that the design of AMT largely supports the needs of Requesters over those of Turkers. Suggested solution is a plugin to rate the Requesters. This is similar to the services provided by Turkopticon.
Reason for Turking:
Money remains to be the primary reason. Few Turkers turk to meet their ends while others do so to supplement their primary income. Some state that Turking is fun and gives them joy, though it is debatable if they stick to Turking had it not be monetarily rewarding, albeit really less.
Earnings of a Turker
The annual earning is a highly variable figure from as low as $50 to as high as $15000. But it is dependent upon the number of hours put in by the Turker. The fair pay according to Turkers (on TurkOpticon) is largely according to the current minimum wage in US($7.25 per hour). But even $6/hour is not such a low rate for AMT in general. There are some HITs that pay a rate closer to $1 per hour. The reasons for choosing such a variable and minimal wage is assumed as follows: In many cases their primary source of income is much less than the minimum wage and Turking helps them manage their bread and butter.
Relationships are more than the sum process of doing HITs and receiving pay. Many posts show that direct, open, polite, and respectful communication is highly valued. An engaging requestor is appreciated. Turkers are thought of as having a significant amount of untrustworthy members among them. Though the views are highly negatively skewed, some turkers do resort to cheating and deception. mturkforum.com is misused, where answers, cheats, scripts and workarounds are posted. But work ethics are prevalent among Turkers, there are instances where Turkers accept the responsibility for genuinely rejected HITs
Worker perspective: Turkopticon
- As AMT became popular, they have paid less attention to ethics and values of crowd work.
- The concept grew out of a tactical project intended to raise questions about the ethics of human computation, pointing out that the responsibility of crowdsourcing was to also intervene in AMT by building a technology used by its workers.
- The ethical challenges and issues faced by workers, and the ethical issues we face as researchers, are produced in the encounters between us, the workers, and Turkopticon
- Turkopticon, an activist system that allows workers to publicize and evaluate their relationships with employers.
- Turkopticon allows workers to create and use reviews of employers when choosing employers on Amazon Mechanical Turk
- Explaining it as a micro-labor marketplace draws attention to pricing mechanisms, how workers choose tasks, and how transactions are managed.
- Explaining it as a crowdsourcing platform draws attention to the dynamics of mass collaboration among workers, the aggregation of inputs, and the evaluation of the crowdsourced outputs.
- Explaining it as a source of human computation resources, however, is consistent with how both the computing research community and Amazon’s own marketing frames the system
- AMT defines this type of information work by setting workers up as resources that can be directly integrated into existing computer systems as “human computation.”
- Amazon legally defines the workers as contractors subject to laws designed for freelancers and consultants.
- Employers can literally access workers through APIs through AMT.
- AMT brings together crowds of workers as a form of infrastructure, rendering employees into reliable sources of computation
What is infrastructure ?
- The standard working of a system where there is minimal amount of self-introspection and thus less fault tolerance
When is infrastructure ?
- This question is raised when there is a need for the system to check when it might break down and take care of its vulnerabilities
Who is infrastructure ?
- The situation where the system takes into consideration about its human components - — the programmers, managers, and start-up hackers who integrate human computation into their technologies
- Explain basic features of AMT and show how the design prioritizes the needs of employers.
- AMT employers define HITs on AMT by creating web-based forms that specify an information task and allow workers to input a response
- The employer then defines criteria that candidate workers must meet to work on the task.
- This filter approach to choosing workers, as compared to more individualized evaluation and selection, allows employers to request work from thousands of temporary workers in a matter of hours.
- Once a worker submits work, the employer can choose whether to pay for it. This discretion allows employers to reject work that does not meet their needs, but also enables wage theft.
- Because AMT’s participation agreement grants employers full intellectual property rights over submissions regardless of rejection, workers have no legal recourse against employers who reject work and then go on to use it.
Bill of Rights
- To provoke workers’ imaginations about the infrastructural possibilities, we placed a task onto AMT asking workers to articulate a “Worker’s Bill of Rights” from their perspective, with hopes of generating interaction between workers and broader publics concerned with crowdsourcing.
- Workers paid per task provided short answers to open-ended questions based on our past experiences questioning workers in the platform.
- Asking a provocative question drew stronger, more detailed responses oriented towards concerns of crowdsourcing ethics.
- The core issues raise by workers were:
- Their work was regularly rejected unfairly or arbitrarily
Demanded faster payment
- “minimum wage" or “minimum payment" per HIT
- “fair" compensation generally
- expressed dissatisfaction with employers’
- Amazon’s lack of response to their concerns
- The critical reason for which workers were dissatisfied with Amazons was that since Amazon collects money for task volume, they feel it has little reason to prioritize worker needs in a market with a labor surplus.
- There were few shared values and priorities that could guide the development of an infrastructure of mutual aid.
- Motivated by responses to the “Bill of Rights,” we designed and built Turkopticon
- The system allows workers to make their relationships with employers visible and call those employers to account
- They also build up a commons together with other contributors.
Standardizing Requester Reputations
- Because the AMT model often has workers doing HITs from a large number of employers in a session, they took ratings on various qualities rather than taking an aggregating rating in the style of product review sites, we offered workers discretion in evaluating the ratings.
Communicability: How responsive has this requester been to communications or concerns you have raised? Generosity: How well has this requester paid for the amount of time their HITs take? Fairness: How fair has this requester been in approving or rejecting your work? Promptness:How promptly has this requester approved your work and paid? Development
- Create a task with a list of prominent requesters and solicited initial reviews for which it compensated workers. The initial reviews seeded our database so new users installing.
- Reputation without Retribution: Maintain accountability and anonymity
- Turkopticon attempts to prevent employers from retaliating against workers writing reviews by obfuscating workers’ email addresses.
- Workers would fear retribution for writing critical reviews is the need for reputation among users of the system.
- Readers of reviews can make judgments about the credibility of workers by evaluating other contributions by the user and making their own decision about whether to engage the employer.
- Selected a first cohort of moderators by calculating the most prolific reviewers on the site, and inviting them to act as reviewers on
- Relied on primarily social moderation, by a small number of moderators, for several reasons.
- Automated approaches are difficult to implement in practice because they cannot account for community-specific and emergent norms
- Requesters could easily make an account and begin flagging negative reviews they have received, or even pay Turk workers to down vote their reviews
Maintenance and Repair
- Though HCI has conventionally been concerned with the design, deployment, and evaluation of technological artifacts, the social and technical life of Turkopticon, like any technology, depends on ongoing maintenance and repair
- We enlist moderators in discussions of website policy and interaction design, and alter and repair the technology in response to their requests and observations.
- This work of maintenance and upgrading, undertaken with the participation of workers, does more than offer insight into needs and requirements.
- This work strengthens ties and builds solidarity among workers collaborating on the practical, shared, and political circumstances they face as crowdworkers.
- Turkopticon employs tactical quantification to enable worker interaction and employer accountability
- Standardizing ratings into quantified buckets was instead a compromise we made to our own values
- There needs to be a quantification of sorts that takes into account the social standards of various workers and requesters and the weight of the existing infrastructural norms that torqued our design decisions
- Bringing together workers from across the world as a public to engage in shared inquiry and democratic interchange would require speaking across cultures, cutting across vastly different circumstances.
- By creating infrastructures for mutual aid, the discussions become more issue centered as we bolster the social interchange and interdependency.
Conclusion owing to the practical embeddedness, it has drawn sustained attention to ethical questions in crowdsourcing over the course of its operation. Through the design of layered infrastructures, we can support complex and overlapping publics that open up questions about possible futures once again steer and shift the existing practices and infrastructures of our technological present
Requester perspective: Crowdsourcing User Studies with Mechanical Turk
Importance User studies are vital to the success of virtually any design endeavor.
- User evaluations may include methods such as surveys, usability tests, rapid prototyping, cognitive walkthroughs, quantitative ratings, and performance measures.
- An important factor in planning user evaluation is the economics of collecting user input.
- It is often not possible to acquire user input that is both low-cost and timely enough to impact development. The high costs of sampling additional users lead practitioners to trade off the number of participants with monetary and time costs.
- These factors have led to new ways for practitioners to collect input from users on the Web, including tools for user surveys, online experiments
- In this article we investigate a different paradigm for collecting user input: the micro-task market.
- We define a micro-task market as a system in which small tasks (typically on the order of minutes or even seconds) are entered into a common system in which users can select and complete them for some reward which can be monetary or non-monetary (e.g., reputation).
- Tasks typically require little time and effort, and users are paid a very small amount upon completion (often on the order of a few cents).
- Adapting this system for use as a research and design platform presents serious challenges.
- Success of the system appears to be driven by the low participation costs for accepting and completing simple, short tasks. In contrast, paradigms for user evaluation traditionally employ far fewer users with more complex tasks, which incur higher participation costs.
- Mechanical Turk is best suited for tasks in which there is a bonafide answer, as otherwise users would be able to “game” the system and provide nonsense answers in order to decrease their time spent and thus increase their rate of pay. However, when collecting user ratings and opinions there is often no single definite answer, making it difficult to identify answers provided by malicious users.
- The diversity and unknown nature of the Mechanical Turk user base is both a benefit and a drawback
- Mechanical Turk users rate a set of 14 Wikipedia articles, and then compared their ratings to an expert group of Wikipedia administrators from a previous experiment, who are highly experienced Wikipedia users with a strong track record of participation
- In addition, users were required to fill out a free-form text box describing what improvements they thought the article needed to check on authenticity of the ratings given by users
- The results showed that rather than widespread gaming, a small group of users were trying to take advantage of the system multiple times.
- The rating task was altered. In the new rating task, users were required to complete four questions that had verifiable, quantitative answers before rating the quality of the article.
- Questions were selected to remain quantitative and verifiable yet require users to attend to similar criteria as the Wikipedia featured article criteria, and as what Wikipedia administrators claimed they used when rating an article.
- In addition to the improved match to expert ratings, there were dramatically fewer responses that appeared invalid.
- Experiment 1
- marginal correlation of turkers’ quality ratings with expert admins, and also encountered a high proportion of suspect ratings.
- Experiment 2
- better match to expert ratings, a dramatic decrease in suspect responses, and an increase in time-on-task.
- Turker population is drawn from a wide range of users, they represent a more novice perspective and likely weight different criteria in making quality judgments than the highly expert admin population.
- judgments from a varied crowd population may be even more useful than a limited pool of experts.
- First, it is extremely important to have explicitly verifiable questions as part of the task.
- Second, it is advantageous to design the task such that completing it accurately and in good faith requires as much or less effort than non-obvious random or malicious completion.
- Third, it is useful to have multiple ways to detect suspect responses.
- In this study we examined a single user task using Mechanical Turk, finding that even for a subjective task the use of task-relevant, verifiable questions led to consistent answers that matched expert judgments.
- These results suggest that micro-task markets may be useful for other types of user study tasks that combine objective and subjective information gathering.
- Some of these are common to online experimentation, there is no easy way for experimenters to fully control the
- Absence of robust support for participant assignment
- Work needs to be done to understand the kinds of experiments that are well-suited to user testing via micro-task markets.
- Hundreds of users can be recruited for highly interactive tasks for marginal costs within a time frame of days or even minutes. However, special care must be taken in the design of the task, especially for user measurements that are subjective or qualitative.
Requester perspective: The Need for Standardization in Crowdsourcing
Comparison of crowdsourcing with free markets and standardized manufacturing.
- In modern day manufacturing, processes are highly standardized.
- A large number of goods can be made with a small set of orthogonal processes.
- The author points out that lack of standardization has the following set of problems:
- More development work on the part of the requester, i.e. to code UIs and validation programs.
- Requester has to figure out price on his own.
- Workers have to bear a higher cognitive load in understanding the UI specific to each employer.
- Differing quality requirements for each employer.
- Standardization has a number of advantages:
- Can create reusable blocks which reduce effort needed by the requester.
- Tasks can potentially be standardized, which can let turkers concentrate on the task and not worry about the reputation of the employer.
- Since tasks are standardized and indifferentiable, automated pricing techniques can be used (free market equilibrium as well as sophisticated algorithms).
- After the building blocks of standard tasks are built, composite tasks composed of more than one “block” can be built. These have the following advantages:
- Multiple “simple” tasks can be run in parallel, creating redundancy and increasing reliability.
- Workflows can priced according to the complexity of the tasks.
- Take advantage of research in multicomputer task scheduling to queue tasks and optimize task allocation.
- In the author’s opinion, platforms should reduce externalities
- Provide incentive to requesters to provide feedback.
- Fight fraud.
- Platforms should cooperate with competitors to set standards for crowdsourcing. Such standards should raise wages and make turking a more attractive option for many.
Both perspectives: A Plea to Amazon: Fix Mechanical Turk
A plea to Amazon to fix MTurk
Four major shortcomings were identified: A need for a better interface for task submission: Currently developers use command line tools provided by Amazon to create and manage tasks. For many tasks, requesters need to hire a full developer to manage crowdsourcing. A user friendly interface could lower the barrier of entry, particularly for small tasks.Although simplified tools for creating simple tasks now exist, a system to easily author create any task remains distant.
A reputation system for workers:
- The current amazon rating system is susceptible to fraud. Workers often find ways to bypass the current measures which use “Number of completed HITs” and “Approval rate”.
- Requesters can find value in knowing the educational qualifications of the workers. For example, a request requiring arithmetic calculation would benefit from users with at least high school qualifications.
- Requesters can benefit from knowing the tasks which the worker has previously completed. If the worker has previous experience in essay writing, then tasks could be allocated to these workers first.
- Workers do not perform uniformly well across all categories of tasks. By assigning ratings for each category, the requesters can gain more information about prospective worker.
A trustworthiness rating for requesters:
- Turker forums like TurkerNation are full of threads about requesters and their trustworthiness. Tools like Turkopticon help workers actively establish the trustworthiness of requesters. Different parameters which help workers judge requesters are
- Rejection rate
- Appeal rate
- Volume of posted work.
- Speed of payment
A better user interface for workers:
- Since requesters have to design their own workflow and UI for their tasks, workers have to learn each UI. Some standardization would be beneficial in this matter.
- Another problem is the inability of workers to search for tasks by category or by requester. In such an environment, the expected time to completion for a task increases drastically.
Turkit: Algorithmic human computation using imperative programming style
Iterative tasks versus parallel tasks
- MTurk is capable of doing several similar tasks in parallel as well as iterative sequential tasks such as iterating on an image description.
- Iteration is helpful because instead of making different workers do the same task and picking the best solution a worker can review and add to the task done by the previous turker leading to refined results at lower cost.
- Merit of iterative tasks over parallel recursive tasks is also presented in the model of mClerk. This can be done by the composition of primitive human computation tasks such as soliciting content, voting, and improving content.
Multi-phase task decomposition
Complex human computation algorithms where workers build on each others’ work can be designed conveniently by TurKit. Applications: blurry text recognition, iterative writing.
- Crash-and-rerun programming is the backbone of TurKit Script.
- It is a method for allowing a script to be re-executed without re-running costly side-effecting functions. It generates imperative programs for the user and then crashes (forget their state) and reruns on request or automatically on polling.
- The server preserves state between requests in a database. Turkit instead of storing the current state stores the trace leading to the current state of the program. This leads to the ability to modify programs between reruns to fix bugs.
- In order to avoid the cost of rerunning human computation steps, we look up the previous result in the database. The programmer has control over whether a previously executed line is retrieved from the database or evaluated afresh. This control is primarily embodied in the TurKit Script language feature once.
Crashing versus blocking
- Crashing is better than blocking because the program does not have to store all the states and wait for the blocking function to return.
- Storing all the states can be cumbersome in tasks involving many workers and tasks.
- Crashing stores selective states and reruns other steps much like polling.
- Blocking is not very useful when parallel tasks are being assigned.
- TurKit provides a thin wrapper around these basic features, and also provides crucial higher-level calls not part of the MTurk API. The most important of these functions is waitForHIT,which allows a script to wait until a HIT is completed.
- TurKit Script also provides several generally useful functions: prompt, vote, and sort. Supporting common subroutines helps make writing human computation algorithms easier.
- Prompt function shows a string of text to a turker, and returns their response. Prompt takes an optional argument specifying a number of responses to be returned as an array.
- Additionally, TurKit supports fork and join features for more easily implementing parallel algorithms.
Once function call
- Once accepts a function as an argument, and calls this function. If it returns without crashing, then the return value is recorded in the database, and returned back to the caller. Subsequent runs return the recorded value without calling the function again. Once also records which function was passed to it so that it can ensure that the same function is passed again on subsequent runs of the program.The following type of calls should be listed in once :
- Non deterministic calls such as random functions should be ran with once so that they give the same value on rerun to render the whole function deterministic.
- Costly calls which involve human computation should also be called once and stored in the database for further runs
- If functions have side-effects, then it may be important to wrap them in once if invoking the side-effect multiple times will cause problems. For instance,approving results from a HIT multiple times causes an error from MTurk.
- The crash-and-rerun program will need to rerun all previous iterations of the loop every time re-executes, and eventually the space required to store this list of actions in the database will be too large.In some cases it is better to poll inside the script, rather than rerun it. But ,the model of polling inside the blocking script as opposed to rerunning the entire program does not allow the script to be modified between reruns. We can think of an approach where we don’t have to rerun the entire program but we can still change the script between reruns.
- Turkers have strong community forums where they discuss everything from popular HITs to fixing water coolers. These forums are a learning platform for beginners. Even experienced turkers look up to forums to get information about requesters and type of HITs.
- Requesters often ask turkers about technical know-how of designing a task.
- Requesters use these forums to analyse the needs and efficiency of the turkers.
- Turkers decide reputation of a requester based on reviews on the forums.
- Turkers outside of USA are willing to work but it is difficult for them to get approved.
- Both turkers and requesters are not happy with the MTurk interface and standardization can be helpful.
Needs Of the workers are summarized as below:
- Workers need to be need better- more intuitive and easy to use- task interfaces.
Evidence: Requestor-Question Blood Rayen lists out various interface needs that facilitate turkers like-
- Survey should have progress bar to no the remaining time.
- All qualification for filling any part of the survey need to be listed before hand.
- Turkers should be able to finish their HITs with minimum scrolling.
Interpretation: We need to standardize the interface used by turkers for tasks.
- Workers need to be appreciated in form other than monetary benefits. They feel better to have done something of importance.
Evidence:  necoya says he would prefer remarks such as Good Job! in the feedback.
Interpretation: Turkers appreciate a closing sentiment acknowledging their contributions.
- Workers need an easy and faster search mechanism for HITs as many turkers spend a lot of their time searching for appropriate HITs. This can be more useful for newbies. Mechanical Turk still does not have such functionality, a lot of APIs are coming up for the same but there is a lot of scope in the area.
- Workers want to differentiate spam tasks from authentic one
- Workers need to find task that suit their skill set easily.
- Workers need to find tasks with greater hit and completion rate.
- Workers need tasks that they can be complete with minimum disclosure of their personal information.
Interpretation: Workers can speed up their performance if they can efficiently search for tasks.
Needs Of the Requester are summarized as below:
- Requesters need to have tasks that require the worker to download software this widens the scope of type of tasks but MTurk currently prohibits this.
Evidence: As discussed in the hangout with requesters.
Interpretation: Requesters feel that the bad people are already asking turkers to download malware and the restriction affects only those with good will.
- Requesters need more assurance of completion of the task.
Evidence: As discussed in the hangout.
Interpretation: There is no guarantee that if a turker accepts a HIT he/she will complete it correctly. At times such unpredictability can lead to losses.
- Requesters need easy to use interface to design tasks which help them write more complex human computation algorithms with ease without much expertise.   
- Requesters need to appropriately set time and amount to be paid for a task. If the time assigned for a task is too less it can lead to unnecessary rejection of task. Similarly, appropriate money paid enhances chances quality participation.