Introducing Crowd Research Initiative and Recap - Winter 16

From crowdresearch
Jump to: navigation, search

The Stanford Crowd Research program has been ongoing from March 2015. During this time more than 500 people have participated through two open calls. This is the third call, launching in November 2015. Together, we're building the next generation crowdsourcing platform. To help you get onboard and on pace with the rest of us, we've created this document. Learn what we've done so far, why we're doing it, how we're doing it and more.

Oh, and check out our live system, built by people like you - daemo.stanford.edu (We call it Daemo) or read our UIST paper. Enjoy!

What is crowdsourcing?

Let's start with the most basic question in this project, "what is crowdsourcing?". Crowdsourcing is the process of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers. A great example of crowdsourcing is Wikipedia - a website we all use to learn about specific topics. The wikipedia is built by people like us (crowd), who have spent their knowledge and time to create the world's largest encyclopedia. None of these contributors were traditionally hired, but the entire effort was crowdsourced. Remember, crowd is us, it can be anyone willing to make contributions for a given task or goal. Watch this quick video to learn more or read about research in this space post slide 33.

What is a crowdsourcing platform?

In order to harness crowd potential, we need to reach to crowd. A crowdsourcing platform is an internet marketplace, that enables individuals and businesses (known as requesters) to coordinate the use of human intelligence to perform needful tasks. The crowd is often termed as workers in such platforms. Platforms like UpWork caters to larger projects like website development, while platforms like Amazon Mechanical Turk helps in getting micro tasks done like labeling images or filling in surveys. Other platform includes:

  • Taskrabbit - to get crowd to help you with physical world needs, like shopping and delivery.
  • Zooniverse - to get crowd to help explore space, not possible through world's super computers.
  • Gigwalk - to get crowd to help you find local information from anywhere in the world.

Why do we need a new platform?

Today’s platforms are notoriously bad at ensuring high-quality results, producing fair wages and respect for workers, and making it easy to author effective tasks. What might we create if we knew that our children would become crowd workers? Our goal is to reconsider the design of the crowdsourcing platforms (and train you to become awesome researchers while we do that). Most of the current research focusses on improving the output on a variety of crowdsourcing platforms. However, we want to change the platform itself such that the output is improved already.

What do workers and requesters have to say?

During first few weeks, we asked participants to put on the shoe of a worker and a requester and share their experiences. Participants were encouraged to explore a variety of platforms like: Mechanical Turk, Clickworkers, etc and then do some needfinding. This exercise helped us identify the problems and areas of improvement.

As a worker

  • Very difficult to find work. We kept seeing the same task that nobody is doing, over and over. It becomes super repetitive.
  • Concerns about wages.
  • Need to be treated fairly and respectfully, and have a voice in the platform
  • People with no experience earned much less than people with Turking experience. On AMT, the high-paying tasks were gated behind qualifications. They need to be able to expose their skills so they can get work they are qualified for and advance their skills
  • Scam is quite common.
  • Suggestion: Check out Turkopticon, Reddit’s /r/mturk and /r/HITsworthturkingfor

As a requester

  • Among the people who were new to the platform: signing up was a bear, and wrangling CSVs was not much better
  • It’s hard to trust the results: had to go back and inspect every single response
  • Need to get their HITs completed (quickly / correctly)
  • Need to have workers who have the appropriate skills and demographics do their tasks and trust them
  • Need to be able to easily generate good tasks
  • Need to be able to price their tasks appropriately
  • Seemed like AMT was designed to be most friendly to requesters
  • Suggestion: Check out Panos' blog and Reddit /r/mturk requester issues

What are the factors at play?

There are two main factors at play: trust and power.

  • How do I trust who you say you are? How do I trust that the results I get are results that will be good? How do I trust that you’ll respect me as a worker, and pay me accordingly?
  • Who has the power to post work? To edit other peoples’ posted work? To return results to the requester? Can I, as a worker, send it back myself, or does someone else need to vet it?

These factors inspired bunch of research questions and ideas to address them, some of them being (please see this slideshow for details):

  • Task clarity: How might workers+requesters work together help produce higher-quality task descriptions?
  • Data and Results: How might workers+requesters work together to produce higher-quality results?
  • Disputes: How might workers+requesters work together to produce higher-quality results?
  • Empathy: How might we build more empathy between workers and requesters?
  • Transparency: How might we make payment clear and transparent?
  • Reputation: How might we make payment clear and transparent?

Suggestion: Learn about how to prototype and do storytelling here. You might want to check some of the past submissions below.

What foundation and feature ideas did we discuss?

First, a reminder, we aim to emphasize a conceptual point of view. A string of features is not a research contribution. A point of view gives us a single angle that informs our decisions and features. Before we talk about the emerging foundation and feature ideas, let's try to understand the difference between them in this project's context:

  • Foundation: A new high-level approach to organizing our crowd platform to improve trust and power. For example: Workers organize themselves into collectives
  • Features: Ideas which improve the strength of any platform but aren’t holistic or don’t give it a high-level purpose. For example: Task recommender systems

Some of the emerging ideas we've been exploring are (please see this slideshow for more details):

Our Core Foundation Research Agenda

  • Input + output moderation: Before tasks get posted to the system, they get looked at by a panel of reviewers and either edited or passed
  • Import finance concepts/External quality ratings: Reputation, like credit score: worker ratings of requesters place them into A, B (good), C (fair), D (poor)
  • Open governance: Workers and requesters share power over the platform through annual votes (representative democracy?). See top ideas in this space.
  • Micro+macrotask market: Maintain the submission approach from microtask markets, which is focused on two to hundreds of replications, but find ways to make it accessible to both microtask and expert work

Others of interest:

  • Mobile crowd tasks: mClerk: tasks are designed so that people can complete them on their phones
  • Tiers and mentors: Categorize workers into tiers based on experience, like entry level workers receive aid to establish and familiarize themselves with the platform
  • Price+quality mechanisms: Combine price and quality together
  • Empathy and community: Deploy people within the system to help react to conflicts by creating a sense of community

Suggestion: See this to learn more about challenges in the active foundation ideas. Also see this slideshow or Milestone 9 to see some of the suggestions towards addressing these challenges.

Features

  • Recommendation systems
  • ž How do we pay? Bitcoin?
  • Language transducers to simplify complex writing for a local (international) audiencež as well as check that it’s fair
  • ž Culturally-specific adaptations
  • Link issues + bug reports directly to HITs so everyone can see
  • More...

What are the active foundation ideas being pursued?

Upon weeks of brainstorming and prototyping, we ended up focusing on the following:

Prototyping tasks

Requesters are domain experts and not designers, leading them to believe that their task authorship is high-quality — when it may in fact be quite the opposite. When this occurs, even hard work may result in rejections because workers and requesters had no chance to reach a shared mental model. So, we are investigating prototype tasks, which means that all tasks go through a feedback iteration from workers before launching to the marketplace. Inspired by best practice in the user-centered design process and crowdsourcing, prototype tasks pay 3–5 workers to complete a small percentage of the overall work and provide feedback on how to improve the task interface or clarify it. Requesters use the feedback, and any differences between their expected results and the prototype task results, to iterate. In doing so, requesters increase the probability that their tasks deliver the desired mental model and thus produce work that matches the requester's expectations. The end result is fewer work rejections.

Boomerang reputation

In crowdsourcing marketplaces, both workers and requesters must rely on reputation scores such as acceptance rates or star ratings to make decisions (e.g., accepting work output and hiring). However, these reputation scores are often significantly inflated due to social pressure, leading to "The Yelp Problem": everybody having 4.5 stars. The result is that a worker or requester may agree to work with a highly-rated partner, only to find out that the partner is in fact fairly mediocre. We are thus designing Boomerang, a reputation system for Daemo that promotes objective and accurate rating. To do so, Boomerang returns each rating decision to directly impacts the rater. In particular, giving someone a high rating increases the likelihood of working with that individual again, while giving a low rating reduces that likelihood. So, giving a mediocre partner an artificially high rating means that you will likely be seeing that partner again in future tasks.

Collective governance

From class-action lawsuits against crowdsourcing platforms to unilateral moves by other platforms to raise fees, neither workers nor requesters feel represented by current platforms. We seek to address this power imbalance and mitigate trust issues by introducing a representative democracy governance model within Daemo via the election of a Leadership Board. This community-elected Leadership Board (comprising two workers, two requesters, and three platform designers) is empowered to make policy decisions for the platform. Including all vested parties in platform governance provides an opportunity for idea transfer, transparent communication, and engagement.

What engineering efforts are going on?

For past many weeks we have been building Daemo. It started with this basic Infrastructure which would be able to support these foundation and feature ideas. Our plan is to address these ideas and challenging research questions through engineering efforts. To help you get started, we have created this GettingStarted page and made this video as a tour/bootcamp. Also, feel free to explore our github page and see all unaddressed open issues.

If you want to dive right in, watch the bootcamp video to understand the GettingStarted page and start coding: watch. Currently, we're working on Django, AngularJS, Python, PostgreSql, and would love if you're willing to learn these to make contributions.

Share your Github-id: submit the id to get access.

What have you achieved in the past six months?

  • ACM UIST work-in-progress accepted
  • Successfully completed crowd tasks on behalf of Microsoft
  • Alpha version of the system almost ready (daemo.stanford.edu)
  • Two participants offered full-time RA position at Stanford University - you can too!
  • Semi-finalists at the Knight News Challenge 2015 (top 45 of 1,000+), application under review

What is the plan of action?

As part of this program, we want to create a new marketplace that we’re all proud to support and use. Its a chance for you to learn with us as we experiment with new forms of research at scale. We have a published short paper, full paper is under submission, we released alpha version of the Daemo platform. Now, we want to make it bigger and better.

We plan to continue researching in this space, research will always be integral to our plan and Daemo's future. We want the outcome of our research to reach out to the world, and make huge impact together. When we write a paper, you'd be a co-author. When we launch Daemo, you'll have a huge say. You can also request a recommendation letter from Prof. Michael Bernstein of Stanford CS. In short, we want you to own this project and work together to achieve this ambitious goal. We want to design a new future, a future following a user-centered research trajectory.

General format

We will work in teams toward weekly milestones of your choice, give feedback on each others’ milestones, and take the best ideas to move forward. Each week, we will use the results from our efforts so far to decide on a milestone that we’ll pursue for the next week. Collaborate with your team or form new one's to execute the milestone. Even if you're applying alone, you can quickly form goal based dynamic teams.

Peer feedback

After submitting your team’s milestone, you’ll have about 12 hours to give feedback on a few peers’ submissions. We will use this feedback to: Highlight the highest-rated submissions, Invite teams to join the Hangout on Air, Guide our next steps. However, this does not happen on a weekly basis, and only required when we need to synthesize on a lot of ideas.

Weekly rhythm

  • Monday 8pm PST: team meeting + milestone opens
  • Sunday 11:59pm PST: milestone closes
  • Occasional peer feedback on milestones after milestone closes
  • Occasional meetings to brainstorm and on need basis

Platforms

Resources

  • Relevant Work - used to save relevant articles, links and papers
  • Forums - discussions about this project on external sites. Note that all official announcements and communications will occur via Slack and email.
  • Resources - used to index all platforms and resources as we evolve
  • Infrastructure and GettingStarted
  • Archives - older meetings, slides and milestones
  • Prof. Scott Klemmer's HCI course on Coursera

Selected past submissions

Research and engineering

As part of this project, we will address some open-ended and challenging research questions. To testify or evaluate many of these questions, we might have to engineer prototypes. Prototypes can range from lo-fi (like a paper drawing) to fully developed one's. You are encouraged to form goal oriented teams with people with varied skills, and accomplish goals. In human-computer interaction, research and engineering go hand in hand along with usability, interface and interaction design. Currently, we're working on Django, AngularJS, Python, PostgreSql, and would love if you're willing to learn these to make contributions. We're lucky to have participants from a range of skill sets.

How can we contact you?

Find us on Slack! @rajanvaish or @geza or @ranjaykrishna or @michaelbernstein. Alternatively, you can email us at crowdresearch@cs.stanford.edu.