WinterMilestone 1 nalinc

From crowdresearch
Jump to: navigation, search

This page is intended to serve for Winter Milestone 1 submission by user @nalinc[1].

Do not edit this directly


Name: nalinc, Type: individual

Experience the life of a Worker on Mechanical Turk

I tried hands on Amazon Mechanical Turk Developer environment workersandbox[2] because of its similarities with AMT and restriction-free access across nations. It is a simulated environment that allows users to test their applications and perform HITs[Human Intelligence Tasks]. Being a testing platform, it does not allow transfer of funds outside the sandbox. I completed a total of 21 tasks and managed to earn 2.10 USD* by 2:00 PM IST 14th January, 2015.

Below is a screenshot of my dashboard from worker account at MTurk.

Nalinc workerAccount@MTurkSandbox.png

*Current total is: 27 tasks.

I kept doing few more tasks posted by my peers at Milestone 1 Sandbox HITs[3]

Following are few of the tasks I attempted during the process:

  • Find the Hotel Name Address for Given URL
  • Find the correct(valid) url
  • Find the website address for restaurant
  • Tagging of an Image

What did you like?

  • I really liked the way they allow users to see HITs without logging in.
  • Lots of tasks were available

What did you dislike?

  • AMT is a very restricted platform, especially for non-americans.
  • Signup process is tedious and takes 48 Hrs for activation.
  • No mechanism for worker feedback.
  • Some tasks were unclear and few tasks were poorly designed.
  • Few tasks were available only under certain qualifications.(Though this can be included as a plus point for requesters)

Experience the life of a Requester on Mechanical Turk

Again, like the workersandbox, I tried Amazon Mechanical Turk's developer environment requestersandbox[] to get into the shoes of a requester. I also did minor investigations in microworkers[4] and performed task authoring in Daemo. Speaking of MTurk sandbox, I published the following tasks and aimed to have my HITs done by least 15 different people.

1) Provide descriptions of popular opensource projects[5]

  • Qualification required: None
  • Number of Assignments per HIT: 15
  • Reward per Assignment: $0.10
  • Input csv file: File:Softwares.csv

2) Judge the sentiment expressed toward: random user tweets[6]

  • Qualification required: Worker should be master
  • Number of Assignments per HIT: 5
  • Reward per Assignment: $0.15
  • Input csv file: File:Sentiment.csv

What did you like?

  • Interface for task authoring was nice and provided various categories of tasks[viz, survey, data collection, image tagging etc].
  • Overall interface was okay and their WYSIWYG editor to make tasks was commendable.
  • Easily manage Auto Approval Delay and timeout for individual tasks.
  • Controlling task availability to limited workers (based on qualification/location/accuracy%)

What did you dislike?

  • Tasks created from templates via task authoring interface were not modifiable. I created a task but couldn't control its qualification.
  • As a requester, there was no way for me to ensure the quality of work submitted.
  • I was not able to disable the submission for incomplete forms.[see attached csv, few fields are empty]
  • For fields requiring textual inputs, I was not able to limit the content to minimum/maximum words.
  • No concrete option to control quality of work other than filters(setting a minimum accuracy % or qualification)

CSV file generated when you download the HIT results: File:Batch 119356 batch results.csv

Explore alternative crowd-labor markets

Because of location restrictions, my account registration was rejected at most of the crowd platforms including MTurk. However, I did explore Daemo task authoring locally. Comparison of Daemo with other platforms is described later.

Research Engineering (Test Flight)

Being among the folks who cannot wait to get their hands dirty with code, I decided to set up the local Daemo way before the milestones were announced. Though I've already posted this on #engineering-deliver, here's the screenshot of locally running Daemo on my Ubuntu 14.04 LTS trusty tahr.




MobileWorks is a novel effort by Prayag et. al which brings the existing crowdsourcing market to workers living at the bottom of the economic pyramid by enabling them to complete small OCR tasks on mobile devices.

What do you like about the system / what are its strengths?

  • Broadens participation in microtask markets by marginalized workers.
  • Provide a low-cost, accurate, and efficient solution to organizations with data digitization using simple OCR tasks.
  • Distributes documents among multiple workers by chopping single scanned sheets into small pieces of one or two word. These smaller pieces are then put together to create a digitized copy of the document.
  • Quality is maintained using multiple entry. Each task is distributed to two workers until two of the answers match.

What do you think can be improved about the system?

  • They should open their market to more kinds of tasks such as audio transcription, same language subtitling and local language translation etc.
  • Absence of feedback mechanism.
  • No mechanism to ensure/establish trust among workers/requesters.
  • Currently they operate on mobile platform and focus on a very small user group.


What do you like about the system / what are its strengths?

  • Daemo is a great attempt in addressing the major issues with most of the crowdsourcing platfroms.
  • It tries to establish the trust among workers and requesters.
  • Fixes breakdown in trust: In normal crowdsourcing platforms, reputation scores are mostly overly rated due to social pressure, to avoid retaliation or to help someone continue working on the platform. Daemo mitigates this concern by introducing Boomerang, a reputation system that differs from traditional rate-and-leave systems by 'boomeranging' the accuracy of rating decision back to directly impact the user. For requesters, it drives the likelihood of working with the same worker by giving early/normal/late access to future tasks. For workers, it decides the ranking of tasks in his task-feeds where tasks from highly rated requesters would appear on top and ones by poorly rated requesters will appear at bottom.
  • Fixes breakdown in task authorship: This features tries to match the mental model of worker to that of requesters. What most of the current crowdsourcing platforms ignore is the fact that requesters are usually domain experts and not designers who understand the psychological aspects of workers and user-behavior. This leads them to believe that their task is high-quality when it may in fact be quite the opposite and lack major edge-cases. This issue is tackled by prototype tasks that requires that all tasks go through a feedback iteration from workers before launching to the marketplace. A small set of workers try their hands on a subset of overall tasks and provide useful feedback about the tasks and point any possible ambiguity/missing link. This ultimately builds a common mental modal of expectations and improve overall quality of results.

What do you think can be improved about the system?

  • User experience #1: I believe, Daemo should focus a little more toward user-experience perspective. For instance, I really liked the way AMT show available HITs publicly. This somehow boosts the confidence among users to join the platform making them believe "Woah..Can I really earn doing such tasks, lets sign up!". I know that Daemo is still in its beta, but having such things early in a product's lifecycle would avoid unnecessary hassles when the platform actually grow big.
  • User experience #2: Why do we have a single login for both workers and requesters ? In practical world, I am either interested in doing some tasks or I am interested to get my work done by someone else. If we still need to keep a same login, we can actually let user choose(while log-in) if he want to see the 'worker dashboard' or the 'requester' one. Maybe we can provide a toggle button that would switch roles ? In a nut shell, we should build interface to target a specific group(worker or requester, but not both). Having different signups is completely optional and objective can be achieved either way(single/multiple signups). Though MicroWorkers operates likewise, Upwork and AMT stick with multiple logins. All in all, there should be separation of concerns and semantics to ease usability. A general rule of thumb is, dont bombard a user with Tons of functionality so he gets confused within his own role, target specific groups and features to avoid confusion.
  • User experience #3 It would be great if you could actually allow 'tagging' of tasks while authoring them so workers can filter them later as per their needs/taste. Microworks, for instance have categorized tasks like [Most paying, latest, best paying, Time to rate TTR] and AMT has categorized tasks among[creation date, expiry date, available hits(most/least)/title/reward amount/time allotted]. We can also consider categorizing tasks based on their types: [data collection,categorization,image moderation,sentiment,survey,image tagging,transcription from A/V or image,writing or other].
  • Prototyping process has an additional operational costs, both in terms of money and time while the iteration happens. Also, as mentioned in the white paper, workers usually look for large batches of tasks and prototype tasks can be small in nature. To fix this issue, it has been proposed to provide early access to tasks for workers who perform the prototype tasks. I propose an alternative badge system with this like we have in platforms like Stackoverflow. Instead of compensating money in exchange for reviewing a task, workers can be given some points/badges that increase their status within the community. It is evident from various platforms[Quora/Stackoverflow] that Users are not just motivated by monetary benefits but also by their reputation among peers.
  • No mention of the criteria about who among the available group of workers will perform the 'prototype tasks'. Does it also gets available differently for different workers? Like we launch tasks to marketplace based on worker ratings?
  • In paper it has been mentioned that Daemo makes it mandatory for all tasks to go through a feedback iteration before launching to the marketplace. However, this approach is not practical in situations where the HIT is composed of just a single task(like filling of a survey form).
  • the focus is laid more on trying to fix the upstream problem by iterating on task design before it reaches the workers and it sort of ignores the important aspect of recovering from incorrect work. Incorrect task recovery is an important thing to consider to ensure high quality results. What if requester is not available to rate each and every individual ? White paper has proposed rating a subset(6-10) of workers but that doesn't seem promising.
  • Open-crowd governance: Being a regular user of Stackoverflow(apologies for the third reference), I am really impressed the way they manage open governance. They organize regular elections where normal users can nominate other normal users as community-moderators. These community moderators are just like normal users but have power to report/flag a question which 1) seem noobish/incomplete 2) inappropriate for the platform. We too can have a similar thing in Daemo and involve workers, requester and other researchers to report incomplete/unclear/ambiguous tasks. Lets say 'n' such HITs puts the HIT in feedback loop ?

Flash Teams

What do you like about the system / what are its strengths?

  • a promising framework for dynamically assembling and managing the team of experts
  • This work seem to provide a structured, automated and easy work management schema allowing organizations to send complex tasks directly to a large group of workers.
  • Unlike most of the crowdsourcing platforms that focus on microtasks, it pays attention towards domain specific issues and creativity.
  • Rather than aiming for redundant, independent judgments, flash teams envision a future of crowdsourcing with dynamic collaborations of diverse and interdependent participants.

What do you think can be improved about the system?

  • While the overall concept seems good in theory, it needs to be thoroughly tested in real-life scenarios.
  • Their approach seem to work in controlled/tested environment(with workers from oDesk) but more work needs to be done to check scalability.

Milestone Contributors

Author: Nalin Chhibber @nalinc