High discrepancy between mutation frequency values
1
0
Entering edit mode
9 weeks ago
Ana ▴ 10

Hi, I have allele frequency data from gnomAD and TCGA, from which I have calculated the mutation frequency. However, when I compare them in a scatterplot, the values from the two sources seem to have very little to no correlation.

I would like to know if anybody has advice on why this is happening or how I could improve these results.

Thanks in advance!

MF AF gnomAD TCGA • 2.4k views
ADD COMMENT
3
Entering edit mode
9 weeks ago
Aleksandra ▴ 190

This observed discrepancy is not a technical artifact but the direct result of a fundamental conceptual mismatch between the two data sources. gnomAD catalogues the population-level frequencies of inherited germline variants, whereas the TCGA dataset is profoundly enriched for somatic mutations that are causally implicated in oncogenesis. Consequently, there is no a priori statistical or biological reason for these two distinct frequency distributions to show any correlation. The standard protocol involves using gnomAD as an annotation filter to remove common germline polymorphisms from TCGA data, which serves to isolate the rare and potentially pathogenic somatic events for downstream analysis.

ADD COMMENT
0
Entering edit mode

Thank you so much for the clarification! I'll keep it in mind from now on.

ADD REPLY

Login before adding your answer.

Traffic: 4472 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6