Question: What Approach Would You Recommend For Large Indel Detection With Solid Data
5
gravatar for r.follador
6.5 years ago by
r.follador60
Switzerland
r.follador60 wrote:

I've been spending quite some time on following problem: I sequenced a bacterial genome using paired-end reads (SOLiD) and I have a quite good reference sequence. My goal is to detect changes in the sequenced sample compared to the reference sequence.

The detection of SNPs and small indels (a couple of bp long) is quite straightforward using the standard tools (SAMtools, GATK). However I'm stuck on the task of detecting larger Indels (tens to hundreds of bp). I tried several software and stuck with Pindel (upon a recommondation on this forum).

Because I didn't know wether to trust Pindels output, I started to simulate some data (introducing indels of several sizes into the reference), mapping the original data to the reference and checking wether Pindel was able to detect the changes. Pindel is very sensitive and could detect most of those indels, however its sensitivity is also the main problem. I find it quite impossible to differentiate between true indels and false positives. There is no good statistic regarding the significance of an observation other than the raw number of supporting reads.

My questions:

  • What does someone having more experience in this kind of work recommend? Any other software tools? Another approach? Or will I have to accept the fact that paired end short reads are not optimal to answer this kind of question?
  • For the next time: What sequencing approach would you recommend? 454 reads with de novo assembly and subsequent comparison of the contigs to the reference? PacBio? Does Illumina offer a better approach?

Thanks for any help!

indel variation structural • 3.9k views
ADD COMMENTlink modified 6.5 years ago by William4.5k • written 6.5 years ago by r.follador60
5
gravatar for William
6.5 years ago by
William4.5k
Europe
William4.5k wrote:

Read this review paper from 2011 on Structural variant calling:

http://www.ncbi.nlm.nih.gov/pubmed/21358748

Basicly there are 3 signals you can use for structural variant calling:

1) discordant pair signal

2) readdepth signal

3) split read signal or ( with as a special case denovo assembly split contig mapping signal)

The discordant pair signal and readdepth signal you can get from paired sequencing data produced on all platforms. To use the split read signal you nead long reads and do split mapping of these reads, this is not really usefull on solid data or other short read sequences.

A good discordant pair signal SV caller is breakdancer.

A good split read signal SV caller is pindel.

A good readdepth signal SV caller is cnvator.

2 upcomping multisignal SV callers are SVMiner and Lumpy

ADD COMMENTlink modified 6.5 years ago • written 6.5 years ago by William4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 747 users visited in the last hour