Extract the overlap of well-covered regions across multiple samples

0

Entering edit mode

5.8 years ago

DVA ▴ 630

I am looking at somatic mutations across multiple samples, but the samples are not covered equally in many regions. Since we care about comparing the mutation counts between these samples, I would need to consider the uneven coverage -- e.g. sample A has 5X coverage at position 1, while sample B has 20X coverage at position 1; if I set filtering criteria about coverage in my workflow, and get rid of <10X mutations, then even if sample A has mutation in position 1, I would miss it; thus the comparison would not be fair.

Now my questions is, is there an easy/fast way to extract the well-covered regions across multiple samples? These are all WGS data (bam files size ~50-60GB), so I guess I could run bedtools on all of them and then overlap? Any other suggestions please? Thank you.

coverage WGS • 1.0k views

ADD COMMENT • link 5.8 years ago by DVA ▴ 630

1

Entering edit mode

and get rid of <10X mutations, then even if sample A has mutation in position 1, I would miss it;

If you're afraid of missing such mutation, why do you need to extract well-covered regions ?

ADD REPLY • link 5.8 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Most likely, you don't need to worry about coverage beforehand. VCF files include depth information, and often times, information about per-sample depth and / or per allele. So you can filter by coverage after variant calling.

ADD REPLY • link 5.8 years ago by h.mon 35k

Login before adding your answer.