Question

Is it possible to detect of SNVs and InDels in low coverage WGS (5-10X) data?

1

Entering edit mode

3.2 years ago

wangziwei0010 ▴ 30

Hello, the Biostar community,

What I know about low coverage WGS (or shallow WGS) data is that it is an economic technique for genomic copy number detection in the realms of tumor diagnosis or prenatal diagnosis. However, is somatic single-nucleotide variation (SNV) could also be detected at such a low coverage? (5-10X)

I have a rough idea that a low coverage could greatly influence the F1 score.

However, is there, or could there be, a pipeline or methodology that could sacrifice sensitivity for acceptable specificity, or vise versa, to fit for special aims such as tumor diagnosis? I exhaustively searched for literature but ended up with nothing.

Could any scientist provide literature that I possibly missed or just convinced me that snv & indel calling in 5-10X WGS data is just theoretically infeasible?

Thank you,

Wang

mutation low WGS lcWGS pipeline somatic coverage calling • 2.6k views

ADD COMMENT • link updated 3.2 years ago by Cyriac Kandoth 6.1k • written 3.2 years ago by wangziwei0010 ▴ 30

0

Entering edit mode

(unread)

https://www.biorxiv.org/content/10.1101/2021.07.19.452658v1

Absolute copy number fitting from shallow whole genome sequencing data

ADD REPLY • link 3.2 years ago by Pierre Lindenbaum 164k

0

Entering edit mode

Thank you so much for your kind reply. But snv & indel other than cnv calling in lcWGS is what I was asking about.

ADD REPLY • link 3.2 years ago by wangziwei0010 ▴ 30

0

Entering edit mode

ah sorry I read 'CNV' instead of 'SNV'

ADD REPLY • link 3.2 years ago by Pierre Lindenbaum 164k

score 3 · Answer 1 · 2021-09-01

Look at the power calculations in Figure 2 in the 2013 MuTect paper and Figures 1 and 2 in the 2019 MuTect2 paper. In summary, you need at least 14x depth to detect clonal (in 90-100% of tumor cells) somatic single nucleotide variants (SNVs). Possibly more for indels. Somatic mutations are usually heterozygous, so only 50% of reads would support them. On top of that, tumor samples are usually a mixture of normal/malignant cells (low tumor purity). So, the actual fraction of variant supporting reads will be even lower. If you want to find subclonal mutations that may become clonal months later through treatment resistance, then we have to go even lower. So, it is common practice to do cancer exomes at 250x depth, and targeted cancer panels (~500 genes) at 750-1000x depth. I suspect this is why you don't find any papers even attempting to find somatic variants with shallow WGS.

score 0 · Answer 2 · 2021-09-01

0

Entering edit mode

3.2 years ago

colindaven 6.8k

Yes, it's possible and used widely in population level analyses where multi-sample SNP calling will be carried out. You can look at imputation afterwards.

But you should say something about your study sample and intentions too.

Lit: https://scholar.google.com/scholar?as_ylo=2017&q=low+coverage+snp+calling&hl=en&as_sdt=0,5

This one in particular might be good

https://www.nature.com/articles/s41588-020-00756-0

ADD COMMENT • link 3.2 years ago by colindaven 6.8k

0

Entering edit mode

The OP is asking about somatic variants. Unlike germline variants, these are seen in different subsets (subclones) of the cells being sequenced, and usually sporadic across the genome. We cannot use imputation to infer them.

ADD REPLY • link 3.2 years ago by Cyriac Kandoth 6.1k

0

Entering edit mode

I was asking about somatic variants detection, but thank you for sharing the paper. It's worth reading.

ADD REPLY • link 3.2 years ago by wangziwei0010 ▴ 30