Is it possible to detect of SNVs and InDels in low coverage WGS (5-10X) data?
2
1
Entering edit mode
3.2 years ago

Hello, the Biostar community,

What I know about low coverage WGS (or shallow WGS) data is that it is an economic technique for genomic copy number detection in the realms of tumor diagnosis or prenatal diagnosis. However, is somatic single-nucleotide variation (SNV) could also be detected at such a low coverage? (5-10X)

I have a rough idea that a low coverage could greatly influence the F1 score.

However, is there, or could there be, a pipeline or methodology that could sacrifice sensitivity for acceptable specificity, or vise versa, to fit for special aims such as tumor diagnosis? I exhaustively searched for literature but ended up with nothing.

Could any scientist provide literature that I possibly missed or just convinced me that snv & indel calling in 5-10X WGS data is just theoretically infeasible?

Thank you,

Wang

mutation low WGS lcWGS pipeline somatic coverage calling • 2.6k views
ADD COMMENT
0
Entering edit mode

(unread)

https://www.biorxiv.org/content/10.1101/2021.07.19.452658v1

Absolute copy number fitting from shallow whole genome sequencing data

ADD REPLY
0
Entering edit mode

Thank you so much for your kind reply. But snv & indel other than cnv calling in lcWGS is what I was asking about.

ADD REPLY
0
Entering edit mode

ah sorry I read 'CNV' instead of 'SNV'

ADD REPLY
3
Entering edit mode
3.2 years ago

Look at the power calculations in Figure 2 in the 2013 MuTect paper and Figures 1 and 2 in the 2019 MuTect2 paper. In summary, you need at least 14x depth to detect clonal (in 90-100% of tumor cells) somatic single nucleotide variants (SNVs). Possibly more for indels. Somatic mutations are usually heterozygous, so only 50% of reads would support them. On top of that, tumor samples are usually a mixture of normal/malignant cells (low tumor purity). So, the actual fraction of variant supporting reads will be even lower. If you want to find subclonal mutations that may become clonal months later through treatment resistance, then we have to go even lower. So, it is common practice to do cancer exomes at 250x depth, and targeted cancer panels (~500 genes) at 750-1000x depth. I suspect this is why you don't find any papers even attempting to find somatic variants with shallow WGS.

ADD COMMENT
2
Entering edit mode

Thank you Kandoth for kindly sharing the original paper, such power calculation is exactly what I'm looking for but carelessly overlooked.

I can then convince others that SNV detection in lcWGS is unpractical on a theoretical basis instead of just saying "I feel that" or "I guess".

Best Regards,

Wang

ADD REPLY
0
Entering edit mode
3.2 years ago

Yes, it's possible and used widely in population level analyses where multi-sample SNP calling will be carried out. You can look at imputation afterwards.

But you should say something about your study sample and intentions too.

Lit: https://scholar.google.com/scholar?as_ylo=2017&q=low+coverage+snp+calling&hl=en&as_sdt=0,5

This one in particular might be good

https://www.nature.com/articles/s41588-020-00756-0

ADD COMMENT
0
Entering edit mode

The OP is asking about somatic variants. Unlike germline variants, these are seen in different subsets (subclones) of the cells being sequenced, and usually sporadic across the genome. We cannot use imputation to infer them.

ADD REPLY
0
Entering edit mode

I was asking about somatic variants detection, but thank you for sharing the paper. It's worth reading.

ADD REPLY

Login before adding your answer.

Traffic: 1595 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6