This tutorial describes examples of data portals (visual interfaces, APIs, etc.) that allow a user to mine publicly available cancer sequence data for somatic and/or germline mutations. These resources allow the user to assess the recurrence of specific mutations within cancer subtypes, their sequence identity, predicted functional consequence, etc. Example questions one might ask of such resources:
- What are the most significantly mutated genes in a particular cancer type?
- What mutations tend to co-occur or are mutually exclusive with each other in a tumor?
- What positions or domains within the amino acid sequence of a gene are most frequently mutated? i.e. where are the mutation 'hotspots'?
Some relevant posts:
- Looking For Frequncies Of Mutations In Set Of Genes In Different Cancers
- Mutated Gene Sequence Database For Breast Cancer
- List Of Known Cancer Resources
- Help On Getting Data From Tcga
- Database Of Tumor Suppressors And/Or Oncogenes
Here are some resources that I already know about and have used:
- Genomic Data Commons
- ICGC Data Portal
- UCSC Cancer Browser
- Cancer Genomics Workbench
- TCGA Data Portal
- BROAD, Tumor Portal
- OASIS Genomics
- St. Jude Pediatric Cancer Data Portal
The first four are fantastic resources along the lines I am looking for. Please comment below if I am missing others? For example, there may be others that are less well known or that are more focused on a specific question.
Relevant reviews, primary articles, open-source software projects, etc. would also be welcome. I'm most interested in resources that create a platform for performing complex queries of the raw data, provide summaries and visualizations, etc. I will try to update this tutorial with examples and feedback from the community.
Finally, here are some related resources that we have created ourselves to complement some of those resources listed above: