Difference between revisions of "GettingStarted"

From crowdresearch
Jump to: navigation, search
Line 222: Line 222:
[[File:Data Model 600 300ppi.jpg| 1200px|center| thumb| Fig 3. Data Model Architecture ]]
[[File:Data Model 600 300ppi.jpg| 1200px|center| thumb| Fig 3. Data Model Architecture ]]
[[File:Crowdgaikwad.png| 1200px|center| thumb| Vision ]]

Revision as of 16:03, 22 May 2015

Watch the bootcamp video to understand this page and to get started with coding: watch

Crowdsourcing Research

One of the goals of computer scientists & researchers is to discover innovative ideas and bring them to the real world. We aim to build systems that will help harness collective intelligence of thousands of people around the globe.

Since last few months we have been building the research ideas focused on four core foundation :

  1. Micro+macrotask market: Could the same marketplace scale from 2 to N people not just labeling images, but also Photoshopping my vacation photos or mixing my new song?
  2. Input and output transducers: Tasks get vetted or improved by people on the platform immediately after getting submitted, and before workers are exposed to them. Results are likewise vetted and tweaked.
  3. External quality ratings: Metaphor of credit ratings: rather than just people rating each other, have an (external?) authority or algorithm responsible for credit ratings (A, B, C, etc.)
  4. Open governance: Leadership shared by requesters, workers, (researchers?). Policy changes can be worked out by this group

Getting Started with the Research Foundations

  • Crowdsourcing Research watch After 49.55 minutes Professor Bernstein gives amazing overview of the current state of the research in the crowdsourcing. Slides: pdf
  • Foundations Hangout watch & Slides pdf
  • Synthesis of the ideas Hangout watch & slides
  • Previous Milestones

The Hybrid Approach: Research & Production Level Code goes together

Research & Engineering How to setup the environment?

Fig 1. System Architecture

Local environment

  1. Please go through README

Current folder structure:

  1. Backend: crowdsourcing Serializers, Validators, Viewsets, Models, Views, tests
  2. Front End statcfiles css, js (angular services, controllers) Angular routes Configurations, templates html
  3. Admin csp

System Architecture

  1. Fig 1 shows the overview of the architecture. For more Information see
  2. Data Models
  3. Front End
  4. Git Strategy
  5. Vision
  6. Active Branches: Develop, Staging, Production

If you have questions

  1. Check existing FAQs.
  2. If you don't get answer: add your question to the FAQs list, ping on slack #infra channel, escalate the issue to DRIs

What to work on?

Working on the existing issues, FOCUS: Core Foundations

  1. Take a look at the Open Issues
    Fig 2. Release Cycle & Tags
  2. Helpful search tags: Unassigned Open & Critical, Need Backup ,
  3. Choose the issue you would like to work on
  4. If you want to raise request for the new issue or feature (see the section below)

Creating New issue/feature Requests, FOCUS: Core Foundations

  1. If you want to create new issue, task, feature request add it the Fig 2 Github Issues
  2. The Labels in the Fig 2 highlight various tags that needs to be associated with the issue.
  3. Description Write a clear description explaining the new request. Please explain how does it contribute enhancing the core research foundations. Add below tags:-
  4. Add tags Feature Request, Please Prioritize
  5. Add the one tags from 1 to 9 describing category of your request
  6. Assign the issue to yourself and in the description add following DRI handle so that immediate notification will be sent:
# Category Name Add DRI handles in the Description
1 DESIGN @neilthemathguy
2 FRONT END ENGINEERING @nistala, @neilthemathguy, @dmorina
3 SYSTEMS @dmorina, @elsabakiu, @neilthemathguy, @ksetyadi
4 DATA @dmorina, @elsabakiu, @neilthemathguy, @ksetyadi
5 DEPLOYMENT @ksetyadi, @dmorina, @neilthemathguy
6 SECURITY @ksetyadi, @dmorina, @elsabakiu, @neilthemathguy
7 ANALYTICS @neilthemathguy
8 TESTING @swapagarwal, @dmorina, @ksetyadi, @neilthemathguy
9 OTHERS @neilthemathguy

Creating New feature Requests, FOCUS: Generic other than the foundations

Please follow the same process as above, Add additional tags Nice to Have, Others


Once the issue is assigned to you, please acknowledge.

What are the categories?

Design Dashboard Dashboard Class Class Class

What is the Release Cycle

The release Cycle

  1. Development Saturday, Sunday, Monday, Tuesday, Wednesday
  2. Staging Ready for Production Thursday
  3. Released to Production Friday
  4. Demo Saturday

How to submit the work

  1. Finish the development and testing
  2. Create the branch with the FEATURE NAME and tag ACTIVE TAG
  3. Add the GIT issue number to the request, it will help to cross reference the release
  4. Create the PULL REQUEST
  5. Update the description of the issue

Timeline for each week

  1. We have weekly milestone schedule
  2. Each release should be 5 days: Saturday, Sunday, Monday, Tuesday, Wednesday
  3. The Pull Request should be raised on Wednesday
  4. DIRs should finish the Merge by Thursday/Friday
  5. Saturday is the DEMO Day

I Need Help

  1. Immediately Ping DRIs
  2. Raise Help Needed Tag on the Git Issue you are working on
  3. In case you need more resources/backup Raise Need Backup Resource Tag on the Git Issue you are working on

Coding Guidelines

Collaboration Guidelines

  • Being Respectful & sensitive We are a growing community of young researchers, we come from different skills and backgrounds. Please respect your colleagues and community members. Let's take everyone together and make sure no one is left behind.
  • Constructive Feedback We agree to disagree in respectful manner. As suggested by TAs, Negative comments are discouraged - if you disliked some aspect of a submission, make a suggestion for improvement.
  • Mentor, Help, & Make new friends Share your experience with community, pass the wisdom and knowledge to others.
  • Strictly Prohibited Collaboration is highly encouraged, however, please do not to share your github account with someone else and ask them to checking or develop the code under your name. Please list the names of collaborators in the release and provide appropriate credits to the person who came up with the original idea. If you have any questions regarding this, don't hesitate to talk to DRIs or TAs. We believe every computer scientist should be comfortable using the tools. If you need any help setting up environment please reach out to the DRIs or community.
  • Responsibility This platform is a result of collaborative efforts, our goal is to make the world better place. You can own the part of the system you are working on and take responsibility to productionize and maintain it.

Example: Hello World Tutorial

Create your Profile Page and update it every week with your accomplishments
Accomplishments: Add names of the Features you are working on for current and past weeks. List the production releases you have accomplished. Add the names of people you worked with or give appropriate credit for others ideas and contributions.

Your first contribution

First go through the setup and To get started, we have introduced a simple tutorial to add yourself to the list of contributors on our dev site. It should introduce you to angular.js and to the pull request process on github. This tutorial goes through a subset of the tutorial listed here: https://guides.github.com/activities/hello-world/. You should go through this tutorial to familiarize yourself with git. A sample pull request has been created to go show the intended PR that you will create at the end of this tutorial - https://github.com/crowdresearch/crowdsource-platform/pull/40

Creating a new branch

Create a new branch after checking out the main repo using the following commands.

  1. cd into your root folder of the crowdresearch repo.
  2. git fetch -a
  3. git checkout develop2
  4. git branch [branch-name]
  5. git checkout [branch-name]

Now you're all set to start writing some code!

Writing code

Frontend code is under the /staticfiles folder.

  1. Go to: https://github.com/crowdresearch/crowdsource-platform/blob/develop2/staticfiles/js/crowdsource.routes.js#L71
  2. Create a new url for your own profile page underneath existing urls.
  3. Upload your profile image to: https://github.com/crowdresearch/crowdsource-platform/tree/develop2/staticfiles/images/contributors. We recommend uploading an image that is 200x200 px.
  4. Create a new template at /staticfiles/templates/contributors/<yourname.html>
  5. You can design this page however you'd like.
  6. Now, add yourself to the list of contributors by going to: https://github.com/crowdresearch/crowdsource-platform/blob/develop2/staticfiles/templates/contributors/home.html
  7. Copy the following code snippet and add it after existing contributors. Edit it to include your name and your url.
          <div class="col-sm-2 col-xs-4 thumb">
            <a class="contributor-thumb" href="/contributors/<your url>">
              <img class="thumbnail img-responsive" src="/static/images/contributors/<image_name>.png" />
              <span><Your Name></span> 

Pushing changes to your branch

Now you need to add your changes. Go back to your terminal and cd into the root crowdsource-platform folder.

  1. git status - This should show you the changes you've made and the files you've edited.
  2. git add staticfiles/templates/contributors/home.html
  3. git add staticfiles/templates/contributors/<your_name_file>.html
  4. git add staticfiles/js/crowdsource.routes.js
  5. git add staticfiles/images/contributors/<your_name_image>.jpg
  6. git commit -m "Adding myself to contributor's page"
  7. git push origin [branch_name]

Creating a Pull Request

We use git pull requests to merge code into the main codebase. Your code is reviewed by DRIs (don't worry we're nice :)) and then approved, after which you can merge it.

  1. Go to https://github.com/crowdresearch/crowdsource-platform/compare
  2. Change compare to your [branch-name] from above.
  3. Click Create pull request.
  4. You should see a list of your changes.

Now notify your DRIs on the infra slack channel by pasting a link to your Pull Request.

System Architecture Workflow


  • Nginx is used as a reverse-proxy and serve the static files
  • Gunicorn will handle the WSGI applications, in our case the Django Apps.
  • Rest API The Django app is a great way to modularization. After completing the main web application we will work on rest APU with OAUTH2 autheentication. This app will be used for mobile and desktop clients. Other applications can be derived as project progresses.
  • Websockets: We will need websockets for live communication between the client apps and the users themselves, we will start with Tornado if it plays well with Django.
  • Gunicorn can run on multiple web workers and we will use redis to handle the sessions for websockets and so on.
  • In this architecture it is very easy to implement new features, either by grouping them into a module and just integrating the urls in the urls.conf file. This way you may implement any feature and just plug it in the existing application.
  • Another way would be by extending the current code, it can be done in three simple steps:
    • Create your html templates
    • Add the class based views in the views.py or another file(s)
    • Import the views in the urls.conf file and define your url mappings in there, this will not in any way affect the existing features.


  • Client makes a request via web using AngularJS ngResource or native app made using PhoneGap
  • Request makes a REST API call to the Heroku hosted Django server.
  • Request prepended with /api/<call> gets routed via a gunicorn to Django API server running REST framework.
  • Multiple instances of the api server will be provisioned on different nodes to scale for traffic, each request is round robin(ed) until a free server is found and accepts the request.
  • Django talks with the database coordinator which itself talks only to the Master database.
  • Master database either reads from slaves or writes to master and syncs, this will the job of the PG coordinator. In future data center can be scaled using pgpool-II, middleware that works between PostgreSQL servers and a PostgreSQL database client can be implemented. Watchdog can be used to ensure the high availability feature o it.
  • Data is sent back up the chain via a HTTP response on the REST API and the client is reloaded. There is no page refresh required anywhere and this allows for a smooth native mobile interface as well. This is provided natively by Heroku but this setup can be utilized for any system on AWS, GCE, Rackspace or any cloud provider to allows for maximum scaling of the application.

Current Data Model

Fig 3. Data Model Architecture