Question: Any comparisons of read length for variant calling?
gravatar for Sukhdeep Singh
22 months ago by
Sukhdeep Singh9.6k
Sukhdeep Singh9.6k wrote:

Are there any studies of comparing lengths of 100, 150 and 250 bp (paired end) for variant calling (SNP detection). I was wondering how much we gain (true positive, false negatives) in we sequence a longer read by Ilumina (not PacBio).

gatk snp variant next-gen assembly • 815 views
ADD COMMENTlink modified 22 months ago by Brian Bushnell16k • written 22 months ago by Sukhdeep Singh9.6k
gravatar for harold.smith.tarheel
22 months ago by
United States
harold.smith.tarheel4.3k wrote:

We compared single vs paired-end and 50bp vs 150bp, all at the same genome coverage (20X). We found that paired-end reads offered the most improvement, largely by increasing mappability/coverage within repeated gene sequences (paralogs, pseudogenes, conserved domains). We observed a more modest increase in coverage with longer reads. Details are available here.

ADD COMMENTlink written 22 months ago by harold.smith.tarheel4.3k

Very informative. Thanks Harold

ADD REPLYlink written 22 months ago by Sukhdeep Singh9.6k
gravatar for Brian Bushnell
22 months ago by
Walnut Creek, USA
Brian Bushnell16k wrote:

Longer reads and longer/variable insert sizes are both helpful for resolving repetitive areas, but it's also worth noting that longer individual reads improve indel-calling capability and accuracy. For example, I was able to call up to ~40bp insertions max from 2x100bp reads (using the raw mapping data, i.e., looking for insertion events fully contained in cigar strings). However, after extending and merging pairs to produce fused reads >400bp long, I was able to confidently detect insertions events over 200bp after mapping. The number of short insertions dwarfs the number of longer insertions; as I noted in an email:

This yields approximately 48000 insertions (~2700 longer than 36bp and ~400 longer than 100bp)

And SNPs again outnumber insertions by maybe 20-to-1. So this does not affect the majority of mutations, but then again, you can probably get the majority of mutations with 50bp single-ended reads. If you're interested in long indels (and particularly long insertions), it's worth considering longer reads and extending+merging pairs with longer and variable insert sizes. Although the above was based on 2x100 reads, 2x150 would have worked even better; longer reads are easier to extend and merge. (By "easier" I mean they can be made longer, with greater accuracy.)

ADD COMMENTlink modified 22 months ago • written 22 months ago by Brian Bushnell16k

Thanks Brian for the information.

ADD REPLYlink written 22 months ago by Sukhdeep Singh9.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 633 users visited in the last hour