This tutorial describes examples of data portals (visual interfaces, APIs, etc.) that allow a user to mine publicly available cancer sequence data for somatic and/or germline mutations. These resources allow the user to assess the recurrence of specific mutations within cancer subtypes, their sequence identity, predicted functional consequence, etc. Example questions one might ask of such resources:
- What are the most significantly mutated genes in a particular cancer type?
- What mutations tend to co-occur or are mutually exclusive with each other in a tumor?
- What positions or domains within the amino acid sequence of a gene are most frequently mutated? i.e. where are the mutation 'hotspots'?
Some relevant posts:
- Looking For Frequncies Of Mutations In Set Of Genes In Different Cancers
- Mutated Gene Sequence Database For Breast Cancer
- List Of Known Cancer Resources
- Help On Getting Data From Tcga
- Database Of Tumor Suppressors And/Or Oncogenes
Here are some resources that I already know about and have used:
- COSMIC - Cancer Mutation Census
- Cancer Hotspots
- Genomic Data Commons
- ICGC Data Portal
- Kids First Data Portal
- UCSC Cancer Browser
- Cancer Genomics Workbench
- TCGA Data Portal
- BROAD, Tumor Portal
- OASIS Genomics
- St. Jude Pediatric Cancer Data Portal
The first several are fantastic resources along the lines I am looking for. Please comment below if I am missing others? For example, there may be others that are less well known or that are more focused on a specific question.
Relevant reviews, primary articles, open-source software projects, etc. would also be welcome. I'm most interested in resources that create a platform for performing complex queries of the raw data, provide summaries and visualizations, etc. I will try to update this tutorial with examples and feedback from the community.
Here are some related resources that we have created ourselves to complement some of those resources listed above:
A nice introductory tutorial (video) on Cancer Variant Knowledgebases: "Introduction to Publicly Available Knowledgebases to Aid Interpretations of Genomic Findings in Oncology".
thanks very very very much!!
You are most welcome. I just updated this post to include the Genomic Data Commons. This is a great resource for accessing the raw data, variant call files, etc.
Thank you very much!
Are you solely interested in tools that just visualize/download publicly available data sets? What about resources where you can actually submit your own mutations and annotate, analyze, and visualize?
I find Firebrowse to be a very useful place to access TCGA data.