TCGA driver mutation data
2
0
Entering edit mode
3.5 years ago
aksam ▴ 10

I would like to download driver mutation data for TCGA patients ( in particular lung cancer but ideally 'pan cancer'.

For example I would like to be able to discover the proportions of patients with adenocarcinoma of the lung who have driver mutations in KRAS, EGFR, TP53 etc etc.

I came across this paper - 'Comprehensive Characterization of Cancer Driver Genes and Mutations' (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6029450/) where they produced a database of 9423 exomes annotated with various putative drivers. Does anyone know how to access their dataset? I can't seem to find any instructions.

If not, which source would you recommend to get such data from (i.e. driver mutation data), and why

Thanks in advance

genome • 1.9k views
ADD COMMENT
2
Entering edit mode
3.5 years ago

There are a number of different algorithms that try to identify driver mutation in cancer mutation data diverse approaches. Check an update one: MutPanning (v2.0) from Dana-Farber Cancer Institute and Broad Institute. Here is the paper. You can find a sort of benchmarking in the paper also.

ADD COMMENT
0
Entering edit mode

My understanding is that question refers to driver mutations. MutPanning is a gene-based method. Identifying which specific mutations within those genes are actually driver mutations is a much harder task, as driver genes contain a mixture of passenger and driver mutations.

ADD REPLY
0
Entering edit mode

Many clinical interpretation guidelines clearly delineate that missense mutation in a known disease gene is not sufficient evidence in of it self to be labeled oncogenic/pathogenic.

ADD REPLY
0
Entering edit mode

Thank you for this. It may be useful as a complementary resource to Collin's - will take a look

ADD REPLY
2
Entering edit mode
3.5 years ago
Collin ▴ 1000

I'm one of the first authors of that paper.

The data is available on the Genomic Data Commons website for our paper (https://gdc.cancer.gov/about-data/publications/pancan-driver). Please see the file described: "Mutation Scores and tool aggregation" (Mutation.CTAT.3D.Scores.txt). It contains scores for all missense mutations (~750k mutations).

To get the filtered dataset, you only need to filter based on the flag column for each of CTAT-population ("New_Linear (functional) flag"), CTAT-cancer ("New_Linear (cancer-focused) flag"), and structural clustering ("New_3D mutational hotspot flag"). By convention, a value of "1" indicates a flag for a potential driver mutation according to that approach. The 3,437 number is from any mutation with at least two of the approaches agreeing. The raw scores for CTAT cancer and CTAT population are found in columns "eigenscore (cancer)" and "eigenscore (functional)", respectively.

ADD COMMENT
0
Entering edit mode

For loss-of-function mutations in tumor suppressors, you might look at the genes annotated as tumor suppressors in Table S1. Most variant annotation databases regard frameshift indels, nonsense mutations, essential splice site, stop loss or start loss mutations as likely oncogenic in tumor suppressor genes.

ADD REPLY
0
Entering edit mode

Lastly, if you also want to predict driver missense mutations in new tumor samples outside of the TCGA, you could try CHASMplus (https://pubmed.ncbi.nlm.nih.gov/31202631/ ). The results were highly consistent with our results from the TCGA pancanatlas study, but substantially simplifies the scoring process (available via OpenCRAVAT, https://opencravat.org/ ).

ADD REPLY
0
Entering edit mode

This is great - I didn't know that resource page for TCGA existed. This resource/paper is very useful because, as you say, the leap from variants within genes to annotation of 'driver' is difficult - thank you!

ADD REPLY
0
Entering edit mode

Glad to help. Hopefully this can also help anybody else that had the same question as you.

ADD REPLY
0
Entering edit mode

Insightful Collin. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2060 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6