Question

average coverage of Illumina 2x250 PE vs Illumina 2x100 PE in variant analysis

0

Entering edit mode

8.6 years ago

JstRoRR ▴ 60

Hi,

We are interested in whole genome SNP/Variant analysis. The genome size of the species is 451MB. Initially we sequenced two samples using illumina 2x100 PE and the coverage we received was close to 35x. Now we have sequenced 11 additional samples (same species but different populations) using Illumina 2x250 PE. Now the average coverage have reduced close to 18x. My question is, should we maintain the average converage of 35x for variant studies? Or receiving 18x coverage with double the read lengths is OK?

SNP sequencing Variant Calling • 2.8k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by JstRoRR ▴ 60

0

Entering edit mode

Subsample your initial population to 18x, and you are good to go.

ADD REPLY • link 8.6 years ago by apelin20 ▴ 480

0

Entering edit mode

Hi Apelin, Thanks for your reply.

Is 18x coverage OK for variant analysis keeping 2x250 PE in mind?

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by JstRoRR ▴ 60

Ram · Answer 1 · 2015-09-17

1

Entering edit mode

8.6 years ago

Brian Bushnell 20k

I would recommend you keep the full coverage in all cases. For variant detection, more is better, and 18 is fairly low. Throwing out data won't help you.

18x coverage is sufficient to detect variants in most cases... if the organism is diploid, you will in some cases have trouble determining the ploidy of a variation due to uneven coverage... but, that's what you have. And longer reads are better for variant detection. So, 18x of 250bp reads is better than, say, 18x of 100bp reads.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Brian Bushnell 20k

0

Entering edit mode

I agree with what Brian said. However, should you find that your 35x population is somehow different than the others, for example more/less SNPs detected or other population parameters, keep in mind the difference in coverage. Down sampling in this case would be helpful to be able to compare the populations. Also, 250bp will indeed provide with better alignment and hence have improved variant discovery, but so do paired end reads, which you have in both cases.

ADD REPLY • link 8.6 years ago by apelin20 ▴ 480

0

Entering edit mode

Thanks Brian for the input. We also have the possibility to augment the initial sequencing. I believe we still have some DNA material left with the sequencing providers. If that's the case I hope there is no harm in adding more sequencing data to the current stack to increase the 18x coverage to something like 30x or so?

ADD REPLY • link 8.6 years ago by JstRoRR ▴ 60

2

Entering edit mode

By the way.... 250bpx2.... very often pairs of such reads overlap. Did you merge overlapping reads? That should reduce your coverage by a bit... What is the estimated fragment size of the sequenced library?

ADD REPLY • link 8.6 years ago by apelin20 ▴ 480

0

Entering edit mode

This is a good point... depending on the insert size, you probably have something less (and possibly much less) than 18x independent coverage.

And, as long as you can afford it, adding more sequence is always good for variant detection. Just measure your insert size distribution first to make sure that 2x250 is appropriate and not wasting sequence. For example, you can generate an insert size histogram like this:

bbmerge.sh in=reads.fq ihist=ihist.txt

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by Brian Bushnell 20k

0

Entering edit mode

Thanks Apelin for your insights.

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by JstRoRR ▴ 60