Crowdresearch:Milestone 2 RATH
As our team went through this week's exercise, we kept in mind the idea that we were not looking to merely observe, but to go about need finding to support development of an new system. We opted to go into the need finding process with an eye to our particular interest areas which include: innovation, communication, anonymity, governance and scalability.
- 1 Attend a Panel to Hear from Workers and Requesters
- 2 Reading Others' Insights
- 2.1 Worker perspective: Being a Turker
- 2.2 Worker perspective: Turkopticon
- 2.3 Requester perspective: Crowdsourcing User Studies with Mechanical Turk
- 2.4 Requester perspective: The Need for Standardization in Crowdsourcing
- 2.5 Both perspectives: A Plea to Amazon: Fix Mechanical Turk
- 3 Do Needfinding by Browsing MTurk-related forums, blogs, Reddit, etc
- 4 Synthesize the Needs You Found
- 5 Ongoing Thoughts/Questions
Attend a Panel to Hear from Workers and Requesters
Live Panel Observations
- The Motivations for Turkers were centered on Money as well as the sense of community.
- The money made on a day of Turking can vary significantly.
- Requesters do not have to pay by minimum wage, however some choose to pay minimum wage anyway (UC Berkeley research student)
- The Odesk compensation logistics are more sophisticated (escrow, upfront payment etc.) and wages appeared to be higher as tasks were not as granular, but involved evaluation and assessment.
- Many Turkers don’t make it long on the site. There is a drop off after a short period. Those that do make it tend to be involved in the chat rooms
- The gap between novice and experienced Turkers took on a clear focus as does being a skilled worker versus being a skilled Turker.
- The requestors admitted that pricing was random and shaped by local minimum wage guidelines.
- The platform is still more of an auction than a market.
- Rates tend to be set by requestors, but there have been instances of Turkers setting prices based on collective action.
- Requestors sometimes use local "going rate" as determinant for payment rather than skill.
- Turkers will invite requesters to chat rooms to teach them how to post properly
- Turkers will email requesters about complaints, advice, and questions about the HITs they post
- Workers post good HITs they find in the chat rooms
- Communication, Transparency and Collaboration are still wanting, though corrective actions are driven more by Turkers than requestors. The system is still very asymmetrical.
- Amazon has no training for requester or Turker
- Requesters often do not know how to post properly.
- Workers use the forums and scripts extensively in order to get the best HITs possible. They often can have more than one computer screen to help them see everything.
- oDesk involves more evolved tasks that are less granular and require more problem solving abilities.
- Some Turkers try to cheat (copy and paste) on HITs.
- One commercial requestor noted defining a worker group, the benefits of corrective and purposeful communication and the advantages of breaking tasks into smaller tasks as keys to success.
- The plug in tools that have been developed on MTurk are still quite cumbersome, but if you know how to use them, they are quite effective.
- While no correlation between the duration of a task and accuracy was given, there was a universal agreement that the more granular the task, the better for the requestor.
- Requesters often do not list the intent behind their HIT.
- The ranking system as it currently stands does not reward or encourage honest assessment of work
- Turkers tend to have other jobs but contribute a considerable amount of time to MTurk (30 hours a week)
- Some requestors had never rejected a HIT, even if the work was not acceptable
- oDesk has a validation of credentials to gain entry to the system which improves transparency but creates an inherent barrier to entry while oDesk workers struggle to gain ranking within the system.
- Turkers are able to shame requesters and lower their ratings through forums and internet
- Some requesters choose to go through all of the Turker responses to make sure that there are no fraudulent ones while other requesters trust the Turker responses and generally accept them as they are.
- oDesk profiles lead to an alignment of skills hence requestors and workers could have content or subject based conversations
- One requestor didn’t reject because of the hassle and time consuming nature of negotiating with Turkers.
- So much weight is given to reputation, yet it comprises so little quantitative data, is shaped as much by opinion and reveals little about the actual skillset/capability of the Turker
Reading Others' Insights
Worker perspective: Being a Turker
- People turk for money. Many people do it because they need the money--either because they do not have another job or their other job does not pay enough.
- Turkers rarely make up to $15k a year (about a year working 40 hour weeks for minimum wage) but some do!
- Turkers set daily money goals for themselves
- Turkers commiserate with each other about money and requester issues in the forums
- largest area of Turker Nation is devoted to the “Requester Hall of Fame/Shame”
- Turkers use collective individual actions to adjust fairness requester fairness/pay
- Turkers say that they don’t want the government to get involved or regulate AMT
- Requesters sometimes seem to reject Turker request for no reason at all. The turkers get very angry about this. Requesters can also permanently block Turkers. If a requester blocks too many people, Amazon may suspend their account
Worker perspective: Turkopticon
- Workers were looking for faster payment, fair or increased payment, more fairness in evaluation and rejection of work and lack of participation from Amazon in response to concerns.
- Generally the workers feel that the tool itself was developed for the requestors and would like to decouple that asymmetry.
- Workers are looking for better mechanisms for dispute resolution.
- Turkopticon is an activist approach to redress worker grievances rather than to address flaws inherent in the system.
- Dahn Tamir stated "You can not spend time exchanging email. The time you spent looking at the e-mail costs more than you paid them. This has to function on autopilot as an algorithmic system...and integrated with your business processes."
- Requestors need to have a signal to noise ratio (or compensation to communication ratio) that allows mTurk to remain an economically feasible option.
Requester perspective: Crowdsourcing User Studies with Mechanical Turk
The article was written in 2008 and therefore is more of a historical snapshot of MTurk at that time. We would be very interested to conduct these same experiments again now that the MTurk platform and community have matured.
- Workers were efficient in well defined micro tasks.
- Does not have support for worker assignment.
- A well defined task/experiment was well executed in MTurk
- Ecological validity cannot be guaranteed.
- Basic dis/qualification available
Requester perspective: The Need for Standardization in Crowdsourcing
A disorganized market allows workers and requesters to have a lot of flexibility. But is it scalable to real world needs?
The criticism of MTurk are recurrent and still unanswered:
- The difficulty of pricing work
- The difficulty in predicting completion times and gaining quality
- The inadequacy of the way that workers can search for tasks
- Workers need to learn the intricacies of the interface for each separate employer.
- Workers need to adapt to the different quality requirements of each employer.
- Every employer has to implement from scratch the “best practices” for each type of work.
- For example, there are multiple UI’s for labeling images, or for transcribing audio. The longterm employers learn from their mistakes and fix the design problems, while newcomers have to learn the lessons of bad design the hard way.
- Every employer needs to price its work unit without knowing the conditions of the market and this price cannot fluctuate without removing and reposting the tasks.
- ”curated garden” approached MicroTask, UTest, and others recruit and train workers and set standardized priced. It costs money to do this but it is more scalable with less scammers…
Problems potentially solved by standardization:
- reusibility of task interface/design
- turkers wouldn’t have to worry about reputation of requester
- easier pricing
In summation, having basic, standardized work units with highly liquid, high-volume markets can serve as a catalyst for companies to adopt crowdsourcing. Standardization can strengthen the network effects, can provide the basis for better reputation systems, can facilitate pricing, and can lead to the easier development of more complicated tasks that comprise of an arbitrary combination of small work units.
Both perspectives: A Plea to Amazon: Fix Mechanical Turk
The writer investigated the platform after the beta version , the blog itself was written four years ago, *we menthioned the writer to understand if there have been major changes on the platform- from his point of view since- he wrote the article
- Workers are struggling in their relationships with requesters in term of payments and work flow.
- The user interface is not friendly to the workers .
- Requesters are facing huge challenges with the user interface and workflow, this difficulty in starting and executing the tasks led to a bad distribution of requestors and type of tasks. Also, it’s hard to post tasks and evaluate or rate the workers.
- Requestors don't have basic good knowledge about workers and their profiles and completeness, this resulted in difficulty in determination of wages rates according to background and qualification, only bad workers would remain in such a market without reputation system.
- Generally, there seems to be no big improvements in the requestors side, between 2008 and 2010 -when the article is written - the only major change was the introduction of a UI to submit batch tasks.
- Obviously, there is not any kind of communication between the workers and the requestors prior to start the process.
- Workers are motivated to make money via MTurk in "downtime" or while doing other things.
- "With diligence and accuracy, you can make a good amount of money there at night while you're watching t.v."
- I love amazon mechanical Turk on my free time In the last month I made over $100 and the task are very easy to complete you can watch tv and complete tasks at the same time."
- Workers need to know that compensation will be occur in a timely fashion.
- I am over 18, a US citizen, have filed taxes in the past, and faxed info to prove my identity but they still denied me from having an account. I have contacted customer service multiple times, but they have been very unhelpful. That prevented me from withdrawing my funds as well as continuing to work on mTurk. - See more at: http://www.bbb.org/western-washington/business-reviews/computer-business-services/amazon-mechanical-turk-in-seattle-wa-22732711/complaints#breakdown
- Requsters need access to advanced technical options for creating dynamic HITS
- This is problematic because it seems reasonable at first, but then you’ll be two or three hours into designing your QuestionForm and find out that the incredibly restrictive QuestionForm XSD schema is incredibly restrictive. https://www.peoplepattern.com/mechanical-turk/
Synthesize the Needs You Found
NEED: New workers need to learn how to be successful quickly.
- EVIDENCE: Many MTurk workers drop out shortly after joining.
- INTERPRETATION: The complexity and opaqueness of self education is prohibitive to early motivating success of mTurk workers.
NEED:Workers need to understand what is expected of them in a given task.
- EVIDENCE:Turkers will e-mail requestors questions about the HIT. Turkers complain of unfair rejection of HITS
- INTERPRETATION: HITS are sometimes unclear especially with new requestors and while the worker may have done what is asked, if the requestor was ambiguous in any way the resulting HIT may not meet the requestors original goal.
NEED: Workers need to feel that they are fairly compensated in a timely manner.
- EVIDENCE: Workers have been known to set prices based on collective action. Workers also have gone so far as to file complaints with the Better Business Bureau for challenges with payment/identity verification
- INTERPRETATION: The compensation challenges are complex and range a spectrum from ability to have an account approved and sustained to determining if compensation for a given HIT is "fair", and if so by which standard (geo based, skill based, time based etc.).
NEED: Workers need to contribute to the ongoing development of a relatively static tool.
- EVIDENCE: Workers are consistently developing plug-ins and work arounds to the current system.
- INTERPRETATION: In the "Plea to Amazon" article as well as many other articles and postings, workers not only did not settle for the existing UI but lobbied for system change. When those requests were unanswered, they found ways to evolve the tools and community to create deeper usability.
NEED: Workers need to feel confident that their ranking accurately reflects their efforts and skills
- EVIDENCE: Workers feel that they are penalized with unfair HIT rejections either subjectively or due to a poorly designed HIT.
- INTERPRETATION: While some requestors seemingly reject HITS unfairly other do NOT reject subpar HITS as they do not find it a valuable use of their time. Therefore reputation ranking are inaccurate in both directions.
NEED:Requestors need to maintain a positive benefit to cost ratio.
- EVIDENCE: Dahn Tamir stated "You can not spend time exchanging email. The time you spent looking at the e-mail costs more than you paid them. This has to function on autopilot as an algorithmic system...and integrated with your business processes.
- INTERPRETATION:While it is understandable that workers want to have an open communication with the requester, at a certain point of granularity communication becomes costly.
NEED:Requestors need to understand the desired deliverable of their own task.
- EVIDENCE: Workers HITS have been rejected because thrill they fulfilled the requested deliverable, the requester's HIT was ambiguous
- INTERPRETATION: There is a steep learning curve, and confusing documentation for requesters. There are multiple questions even on www.stackoverflow.com about how to complete various technical aspects of a request.
NEED: Requesters need access to a deeper and more accurate view of workers skills and abilities.
- EVIDENCE: Requesters have stated that many times when determining compensation they err on the low side because they do not know the skill level of the worker applying for the HIT
- INTERPRETATION: The reputation system undermines the worker not only in underestimating their skill and ability but also by crippling the requester to make informed compensation decisions.
NEED: Requesters need to be able to fully express the HIT.
- EVIDENCE: Technical limitations of the requester UI make producing an effective hit challenging.
- INTERPRETATION: Even the best of intentioned requesters can be challenged by the UI of the system which in turn produces a less than ideal HIT which further compromises the result and both the worker and the requesters ability to have a successful exchange.
We thought it worth noting that we acknowledge the above needs are already taking into account certain assumptions and viewpoints. In discussing this we invite dialogue on the following birds eye view of needs finding.
- Role of Big Data
Restructuring/reformatting the ranking system and building a tool that is balanced to accommodate honest assessment of Turkers and Requestors alike would be an enormous Big Data undertaking, yet with a simple UI, it would look like a duck….placid on top, furiously paddling below. Yet, this could conceivably be the communication conduit needed to better align Turkers with roles (Improve Need Searching), establish a quantitative measure of fair and balanced and establish the platform for more competitive pricing as HITS could be more ‘sophisticated’ (require subject matter expertise or proficiency beyond human intelligence). What would happen if we looked at each challenge and filtered is through "Is there an analytic that could mitigate this challenge?"
- Governance vs. Free Market
All of the needs listed above will shift when we begin to analyze the most fundamental assumptions of the system we are looking to build. Are we looking to build a platform that is managed and "governed" through an ever more complex rules engine, or are we looking to create a system that provides enough analytics and dynamic learning to create a free market that self governs by market forces. i.e. if we reveal a need relating to anonymity, in the former is it highly relevant and fundamental to the rules development, however in the later is irrelevant, as anonymous workers/requesters will sort out if there is a market for each of them.
How do we trace these needs above back to the most vital of aspects in order to have the right "DNA" in the seed of the project? Can we grow a "tree" of a system that truly allows for scalability? Can we build an innovative solution that can serve a range of technologies, task complexities, industries? What assumptions and premises would be required?