Tutorial: Exploring cancer mutation data portals
3.0 years ago by
Washington University School of Medicine, St. Louis, USA
Malachi Griffith

This tutorial describes examples of data portals (visual interfaces, APIs, etc.) that allow a user to mine publicly available cancer sequence data for somatic and/or germline mutations.  These resources allow the user to assess the recurrence of specific mutations within cancer subtypes, their sequence identity, predicted functional consequence, etc.  Example questions one might ask of such resources:

  • What are the most significantly mutated genes in a particular cancer type?
  • What mutations tend to co-occur or are mutually exclusive with each other in a tumor?
  • What positions or domains within the amino acid sequence of a gene are most frequently mutated?  i.e. where are the mutation 'hotspots'? 

Some relevant posts:

Here are some resources that I already know about and have used:

The first four are fantastic resources along the lines I am looking for.  Please comment below if I am missing others?  For example, there may be others that are less well known or that are more focused on a specific question.

Relevant reviews, primary articles, open-source software projects, etc. would also be welcome.  I'm most interested in resources that create a platform for performing complex queries of the raw data, provide summaries and visualizations, etc.  I will try to update this tutorial with examples and feedback from the community.

Finally, here are some related resources that we have created ourselves to complement some of those resources listed above:


