Qualitative Analysis RDQA
- 1 Overview
- 2 Method
- 2.1 TurkOpticon Reviews
- 2.2 Data Sampling Procedure
- 2.3 RQDA
- 2.4 How might Requesters manipulate tasks as a response?
- 2.5 Turker thoughts about pay on Survey Tasks
- 3 Discussion
- 4 Files
This page presents exploratory findings from worker responses collected from TurkOpticon. These responses demonstrate a potential payment expectation for survey tasks and how worker and requesters relate outside and within Turker.
The method chosen for this analysis was emergent as that the data source was constantly changing based from the entries of workers into TurkOpticon, and the researcher wanted to allow the research to create a grounded theoretical model. As such the results from this analysis will be heavily framed from the information structure of the web site, since the workers enter knowledge as framed by the TurkOpticon designers.
TurkOpticon is a requester monitoring tool created and maintained by Lilly Irani at the University of California - San Diego and others and originates from the Turker Bill of Rights originally created in 2008. A web site hosts requester-task reviews created by workers that express ratings and an user experiences.
TurkOpticon Rating Scheme
Turkers can rate requesters on 5 point scales along 4 dimensions: "comm", "pay", "fair", and "fast". TurkOpticon describes the measures as:
- communicativity ("comm"): How responsive has this requester been to communications or concerns you have raised?
- generosity ("pay"): How well has this requester paid for the amount of time their HITs take?
- fairness ("fair"): How fair has this requester been in approving or rejecting your work?
- promptness ("fast"): How promptly has this requester approved your work and paid?
Data Sampling Procedure
Data was collected based from a convenience-random sampling. The researcher chose to collect 2 full pages of responses that were present at the time when the researcher was on the site. TurkOpticon presents difficulty to the task of data collection as that the response collected on the web site stream in real time and the pages update with changing data. To handle this challenge, the researcher needs to keep the web site open without updating to maintain a consistent data set. The data presented is randomly presented to the researcher as TurkOpticon workers from all over the world enter their experiences at the times of their choosing.
RQDA is the R Qualitative Data Analysis package. The package enables researchers to enter data into a database and codify the data by 4 levels: codes, code categories, cases, and annotations. The most basic level are the codes given to data sources. To give structure to the codes, code categories can be composed from several lower level codes. Only codes and code categories were used for the purpose of this study.
Data were collected by copying and pasting review test into RQDA and then coded with the review's respective ratings.
For example, the TurkOpticon review sample would have the text beginning from "Category Validation..." to "...opinion, unacceptable." Thereafter, the sample would be named, saved as an individual file, and codified with "Fair 1","Fast 3","Pay 1","Comm 4", "Rejected", and "TaskValidation". Any other potential codings were included to create as comprehensive of a list as possible for theory development. Other possible codings include "Error" and "ActivityWorker". Each code has a specific definition such as "ActivityWorker" as any activity expressed or implied as being undertaken by a worker. The result can become a list that such as Table Coding Key and be used to look for cross-patterns that indicate potential areas of coding interactions.
RQDA produces a page of results attached to each code -- below is such an example.
Review [index] Clipping 27[85:302] The HIT I was doing expired before I could finish, so I contacted the requester to assist me. They created another HIT for me in order to compensate me for my completed survey. I am happy with my experience with them. 66[175:200] Would love 1000 of these. 67[328:401] But how often does one get a nickle for writing a list of "filthy words"?
Pages such as these enable a rapid overview of clippings from full reviews to understand how certain codes connect with one another. A more comprehensive overview of conceptual relationships are identified case categories, which string together lower level codings into higher level concepts.
For example, one such category might be "Actions created by Errors", which might contain "ActivityWorker","ActivityRequester", and "Error". The category enables a construction of information that begins to connect the puzzle hidden within the data.
For this project, the researcher was interested in the question, "What do Turkers experience outside of Turk?" Such codings that identified "ActivitiesWorkers", "ActivitiesRequesters", "Errors" and "Warnings" were isolated for events explained by workers in their reviews posted at TurkOpticon. The result of which is a hairball diagram that attempts to capture potential experiences that workers have and establish a prototypical chain of causality implied within the collection. However, the truth of this model is the same for all models as once explained by Jay Forrester: "All models are false, some are useful."
Hairball Diagram: What happens to Turkers outside of Turk?
The power of the hairball diagram is its ability to help develop realistic and potentially very detailed scenarios. It begins with the most generic and general relationship found in MTurk, the worker-requester relationship. Both must use a computer of some sort and connect in MTurk to earn income and complete work. This assumption serves as the backbone of the diagram at its center.
As we can see in the diagram, nodes were added that identified the informational spaces where Turkers will use to complete work or contact the requester.
The lower half of the hairball diagram introduces the two major spaces where Turkers have reported in the reviews: a 3rd Party Site [Sources: 31,33] and E-mail. Email is considered generic since it is a ubiquitous form of communication. Turkers, as they connected to a 3rd Party site, reported at least 5 ways how they might diverted from visiting the web site, such as:
- hitting a firewall 
- getting a nondescript error message 
- losing the Internet connection 
- hitting a 404 screen 
- and tapping a broken link [62,33]
At this point, workers receiving this diversion have reported or implied ending the task  or writing a E-mail[1,15]. Here, 10 scenarios have been identified for workers going to 3rd Party Site. A worker for example may hit a firewall and then end his task or send an email to the requester to create a work around. Here are a couple more examples. The scenarios don't have to be as linear and the one aforementioned.
Scenario Development Example
---Begin Task Submission--- ---Evidence--- 1.Worker submits work GENERIC 2.1 Requester mass rejection parameter kicks in GENERIC 3.1 Requester team screens rejected tasks [Account 46] 2.2 Requester sends verification email to worker(UNKNOWN) [Account 56] 2.3 Requester sends automated email to worker [Account 62] 2.3.1 includes a task ticket confirmation for payment [Account 17] 4.1 Requester team submits results report to Worker [Account 46] 5.1 Requester team posts to worker review page [Account 46] ---Begin Generic Email Response--- 1. Worker writes email to requester GENERIC 2. Requester responds to email quickly GENERIC ---something happens--- 2.2 Requester does not receive email GENERIC 2.3 Requester marks worker's email as "spam" [Account 17] NOTE: 17 is vengeful worker. "make sure I was paid my 20 cents".
Might have acted in a way to have pushed requester to mark emails as "spam".
How might Requesters manipulate tasks as a response?
These strategies are areas of control for the requester to achieve an unknown goal with similar tasks posted sequentially. Workers monitor requesters for these changes.
---CONTROL---- ---ACCOUNT(CASE)--- 1. Increase/Decrease Pay 17 2. Introduce Test Screeners before task 30 2.1 Announced/Unannounced 2.2 Paid/Unpaid 3. Task Qualification Constraints In/Decrease GENERIC 4. New Task Attempt Recreation 27 5. Control/Block Emails 17 5.1 Mark all email communications as spam 5.2 Mark partial emails as spam 5.3 Mark none 6. Avoid posting more tasks GENERIC 7. Partition Task Quantities 27
Turker thoughts about pay on Survey Tasks
|Data: TurkOpticon 5 Votes v.||All Others for Survey Tasks|
|Welch Two Sample T-Test||Students' T-Test|
|p = 0.005428||p = 0.003642|
data: dat$Tasks.w..5s and dat$All.Others t = 3.3403, df = 12.791, p-value = 0.005428 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.976024 9.246208 sample estimates: mean of x mean of y 10.863968 5.252852
Tasks w/ 5s All Others Mean 10.86396774 5.252851782 SD 4.807581499 2.259549664 N 10 10 P 0.003642357
The data used in this analysis are present below. Here is some guidance to examine the data for yourself. First, it is useful to generate the current coding key table in a separate text file prior to enable to understand the questions that might be asked regarding a RQDA files. Use the following code to generate this:
tableCode<- x[match(unique(x$codename), x$codename),]
The text file will be created and saved into the working directory under the name ProjectCodeKey.txt. The code key table for the first analysis can is provided here. For the purposes of this project, two file sets were created during sampling.