WinterMilestone 1 MarkWhiting

From crowdresearch
Jump to: navigation, search

Experience the life of a Worker on Mechanical Turk

I've worked on Mechanical turk a few times just to test it out. I think in total I've now earned about $5 with it spread out over the last year or two.

In each case, I found a majority of the tasks to be incredibly uninteresting, and, as a result, I often have a hard time focusing on them, and find myself taking more time doing them, because I would much rather be doing something more interesting. For example, while doing a task like transcribing text from a scan of a table into a web form, I found myself swaying from the task about once every 2 minutes, so a task that was designed to take about 10 minutes ended up taking closer to 35. I only earned a few cents for the task so it was by no means worth it for me to do it if it was going to take me so long.

Other task types seem to be a lot more interesting but can suffer from issues with implementation. For example, some surveys I tried were relatively interesting and earned reasonable amounts of money for the time invested, but would often have small user experience issues such as a button not working as expected.

In summary, I find working on Mechanical Turk to be quite unpleasant due to tasks that I have low levels of interest in and due to poor design or implementation on behalf of the requester. I think there will always be tasks that are not interesting and at some price, there will always be people willing to do those tasks, but I think it would be very good for us to make it hard for requesters to send out tasks that have implementation or design problems of any sort. I think having more rigorous checking stages helps with this and I hope it's enough to remove the issue entirely.

Experience the life of a Requester on Mechanical Turk

I've run studies on Mechanical Turk before, so for this task I just wanted to do a very simple survey and ask a few direct questions about how workers use the platform. I asked the following questions:

  1. How likely are you to recommend using Amazon Mechanical Turk to your friends? (on a 4 point scale: very unlikely, unlikely, likely, very likely)
  2. What do you enjoy about using Amazon Mechanical Turk? (list and explain as many things as you can think of)
  3. What do you DO NOT enjoy about using Amazon Mechanical Turk? (list and explain as many things as you can think of)
  4. Which HITS do you like most and how do you find them? (please list and explain if there is more than one answer)
  5. What would you tell to requesters if you could?

I also limited the study to only workers who had more than 5,000 accepted HITs

At the time of this writing n=29 with average work duration of 415 seconds (sd 269). Only 5 respondents would NOT recommend the service and 1 didn't respond. The total average was 3.48/4 (sd .88).

In a high level review of the textual responses, participants were very happy to have the flexibility to work in their spare time, and several mentioned they'd happily do it full time if they could earn a little more. Some respondents mentioned they enjoyed the opportunity to express their opinion and participate in research. On the negative side respondents mentioned the low wages and lack of mobile support. Lack of good tools to find HITs and poor HIT design also came up frequently as problems.

In both the positive and negative responses issues relating to internationalization were frequently implied.

Respondents' preferred HIT types varied widely. At this time I've not conducted any correlation analysis on types of preferred hits and responses to other questions. Many people like surveys as a way to reasonably quickly making money and many talked about interesting mechanisms that help them find good hits. However, there was not significant agreement between participants on methodologies.

Lastly, in recommending how requesters could improve their HITs, many respondents mentioned, faster approval, better hourly wage, and better quality instructions.

In conclusion, I think a lot of what the results from this very quick study show are similar to things we already know about the community, and reflect many of the issues found by other studies, including those we read for this Milestone. I was really excited to see that so many respondents were so positive about their experience, despite the inherent issues with using MTurk, and I was pleasantly surprised by the number of people who claimed to use MTurk as their primary source of income.

In review of this experience, I think my biggest takeaway is that for simple surveys, MTurk could plug into some statistical packages out there data analysis could be conducted insight much faster. Additionally, requester interface has so many little problems that could easily be fixed, just as the worker interface does too. All in all, I have only had good experience with worker on MTurk and I think making a tool like this that does a better job will be a great service to humanity.

File:Mark Whiting Milestone 1 Results.csv

Explore alternative crowd-labor markets

I didn't do this at this time because Mechanical Turk worked fine for me.



MobileWorks is a short paper about a quick study that reiterates important arguments about the future of work.

The two points I liked most about this project were: 1) expected utility based pricing model, based on expected quality of work from a given worker, calculated from their prior tasks, and 2) the seemingly high quality use of very low fidelity communication systems – dumb phones. I think the first point is a valuable way to manage pricing in a lot of different kinds of work and could be useful to adopt in other domains. It seems better than applying skills checks that are a bit arbitrary and sometimes take pointless amounts of time. On the other hand, the second point reminds us of the importance of making systems with a wide range of interface options.

To me, this study is just the beginning and many more projects of this type should be conducted to see how well we can use the features shown here to valuably improve the lives of workers and those who can not yet work. That said, I think there are also opportunities for improvement that are directly obvious. On one hand, providing for a single low fidelity interface is great, but, arguably, enabling users to step up and perform faster work on other platforms is also critical. By giving users this option the researchers could find out more about what kinds of interfaces and contexts workers find most compelling, and valuable insight about when and were people work best would be gained. Additionally, I think the authors of the paper could have performed a more statistically intense analysis of the data they collected, in specific discussing how well the pricing mechanism worked, and things like the correlation between high earners and task speed, and pricing expectations.

All in all, I enjoyed the paper a lot and I think it offers a great start, but I think it leaves a lot of stones unturned.


I think Daemo is really nicely designed, thats why we are all here!

My favorite aspect of the design of Daemo and Boomerang is that the ratings have been designed to be incentive compatible. I think everything should be incentive compatible, and I am really happy to see that approach has been used here. To extend this, I think there may be additional aspects of algorithmic game theory that could be effectively used to strengthen the ranking and allocation algorithms in Daemo and over time, as I think about those, I will try to contribute what I conclude.

I don't see many significant weaknesses of this approach although I do think making strategic decisions about how to host and manage Daemo, including the self governance approach is quite important. I see Daemo as a solution to many problems in one shot, and I wonder if dealing with its industrial implications is perhaps a separate project for a company to focus on at a later time. I'm not firm on this stance, its just something that comes to mind.

Flash Teams

I found this paper really interesting and very thorough!

The proposed system seems to be a great stepping stone into a more automated work management schema allowing companies with complex tasks to send those tasks directly to a large, scalable workforce and expect reasonably good results. To me, this is one of the common challenges with crowdsourcing, because, although many papers have demonstrated complex work, it has generally been at the expense of specifically developed tools to manage that work. Flash teams and the tools used to run it offer a great alternative to this approach.

That said, I think there are a few important challenges that remain: 1) professional management offers significant value in some situations and that should be better represented here (either computationally, or via human intervention, 2) complex task design, though greatly facilitated by the provided tools, remains tricky for tasks where managerial optimization is possible. To the latter point, consider designing a car with Flash Teams. GM and Tesla have both developed new car platforms but their operations and management approaches are very different. GM has a reliable mechanism that always achieves a similar level of quality and takes a very predictable amount of time. Tesla on the other hand, has a untested, much more frantic operational approach but it has achieved incredibly strong results so far. So, if you were thinking about which company to mimic with your Flash Team Car, how would you decide?

A car is an extreme case, and would really require thousands of such decisions, and so I think it proves this point well; although flash teams is a great step in the right direction, we need to incorporate best practices and good learning to achieve optimal automated management of tasks to do really complex work at scale.

My MTurk (half) Workday

Jeff Bigham writes briefly about his experience in trying to do a half day's work on Amazon Mechanical Turk. His conclusions highlight a few seemingly important points:

  • Make sure you advertise our hourly wage in the HIT title
  • Give incremental feedback about payment whenever possible.
  • Try very hard to avoid usability problems that requesters can control

And he goes on to say, "we need to make certain going forward with research that we’re not fixing issues with the site that Amazon or requesters could fix overnight if they were incentivized to do so."

To me this highlights both many of the strengths and weaknesses of Amazon Mechanical Turk. On the negative side, it has many problems due to poor design requesters need to work hard to make sure their hits are good and well received, workers are incentivized to cut corners where possible. However, as a positive, so many workers have spent so much time using the tool that a strong community has emerged, and it might actually not take much for Amazon to fix many of the ongoing issues.

Milestone Contributors