Question: Any comparisons of read length for variant calling?
gravatar for Sukhi Singh
3.8 years ago by
Sukhi Singh10k
Sukhi Singh10k wrote:

Are there any studies of comparing lengths of 100, 150 and 250 bp (paired end) for variant calling (SNP detection). I was wondering how much we gain (true positive, false negatives) in we sequence a longer read by Ilumina (not PacBio).

gatk snp variant next-gen assembly • 1.5k views
ADD COMMENTlink modified 3.8 years ago by Brian Bushnell17k • written 3.8 years ago by Sukhi Singh10k
gravatar for harold.smith.tarheel
3.8 years ago by
United States
harold.smith.tarheel4.6k wrote:

We compared single vs paired-end and 50bp vs 150bp, all at the same genome coverage (20X). We found that paired-end reads offered the most improvement, largely by increasing mappability/coverage within repeated gene sequences (paralogs, pseudogenes, conserved domains). We observed a more modest increase in coverage with longer reads. Details are available here.

ADD COMMENTlink written 3.8 years ago by harold.smith.tarheel4.6k

Very informative. Thanks Harold

ADD REPLYlink written 3.8 years ago by Sukhi Singh10k
gravatar for Brian Bushnell
3.8 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

Longer reads and longer/variable insert sizes are both helpful for resolving repetitive areas, but it's also worth noting that longer individual reads improve indel-calling capability and accuracy. For example, I was able to call up to ~40bp insertions max from 2x100bp reads (using the raw mapping data, i.e., looking for insertion events fully contained in cigar strings). However, after extending and merging pairs to produce fused reads >400bp long, I was able to confidently detect insertions events over 200bp after mapping. The number of short insertions dwarfs the number of longer insertions; as I noted in an email:

This yields approximately 48000 insertions (~2700 longer than 36bp and ~400 longer than 100bp)

And SNPs again outnumber insertions by maybe 20-to-1. So this does not affect the majority of mutations, but then again, you can probably get the majority of mutations with 50bp single-ended reads. If you're interested in long indels (and particularly long insertions), it's worth considering longer reads and extending+merging pairs with longer and variable insert sizes. Although the above was based on 2x100 reads, 2x150 would have worked even better; longer reads are easier to extend and merge. (By "easier" I mean they can be made longer, with greater accuracy.)

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Brian Bushnell17k

Thanks Brian for the information.

ADD REPLYlink written 3.8 years ago by Sukhi Singh10k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 967 users visited in the last hour