Question: Structural Variation Detection On A Short Reads Simulation
gravatar for Pascal
5.6 years ago by
Pascal1.4k wrote:


I am trying to do the following exercise (still playing with indels):

  1. simulate short reads (and indels) out of a human genome chromosome,
  2. align the short reads simulated in (1),
  3. call indels using a SV detection tool,
  4. compare results in (3) with the indels generated in (1).

For this workflow, I'm using wgsim in (1), bowtie2 in (2) and dindel in (3). I've written at the end of this question the complete commands list.

So far I haven't been able to detect a single indel and I am wondering what I am doing wrong. Is it because I haven't simulated enough reads (1M reads)? Or something else? I am wondering if I misunderstood some concept or it is simply a matter of adjusting a parameter.

I tried to make the simulator to extend the indels (with -X parameter) but it didn't help.

Thanks in advance for any hint!

samtools faidx human_g1k_v37.fasta 20 > human_g1k_v37_chr20.fasta
wgsim -X 0.95 human_g1k_v37_chr20.fasta out.read1.fq out.read2.fq > wgsim.out
bowtie-build human_g1k_v37_chr20.fasta homo_chr20
bowtie -t homo_chr20 -X 700 -1 out.read1.fq -2 out.read2.fq -S homo_chr20.sam
samtools view -bS homo_chr20.sam > homo_chr20.bam
./dindel_x86-64  --ref human_g1k_v37_chr20.fasta --outputFile 1 --bamFile homo_chr20.bam --analysis getCIGARindels

=> no variants detected here (in 2.variants file) so I don't go ahead :-(

indel bowtie structural • 1.9k views
ADD COMMENTlink written 5.6 years ago by Pascal1.4k

Pascal, as Wolf noted bowtie(1) cannot be used here. to verify this, check the CIGAR strings in your SAM output if you can find a single "I" or "D" there.

ADD REPLYlink written 5.6 years ago by Michael Dondrup42k
gravatar for Wolf
5.6 years ago by
Wolf120 wrote:

I may be missing something very basic here, but are you sure you are actually using bowtie2 and not bowtie1? the executables are bowtie2 and bowtie2-align and i think bowtie2 takes it's index with the -x switch.

ADD COMMENTlink written 5.6 years ago by Wolf120

Thanks Wolf ! You're definetly right: I'm actually using bowtie1 here :-O it's because I clicked the lastest version link and I got the lastest release from bowtie1 where bowtie2 is beta still... I think it won't help to fix my problem (I'm correctly using bowtie1) but you're right pointing it out!

ADD REPLYlink written 5.6 years ago by Pascal1.4k

but bowtie1 does not do gapped alignment, so there won't be any reads spanning your indels

ADD REPLYlink written 5.6 years ago by Wolf120

Very good point Wolf! No gapped alignments, no indels, actually your comment contains the correct answer to the question. Pascal, try to replace bowtie with bwa, mosaik, bfast.

ADD REPLYlink written 5.6 years ago by Michael Dondrup42k

No, I think I will first use bowtie2. This was my initial objective as I read that "Bowtie 2 supports gapped alignment with affine gap penalties. Number of gaps and gap lengths are not restricted, except by way of the configurable scoring scheme. Bowtie 1 finds just ungapped alignments." Can you confirm this is what we are looking for or should I definitely use BWA? In any case I'm confused because I intended to use at first bowtie2 not 1 :-( Wolf, your're the greatest, thank you !!

ADD REPLYlink written 5.6 years ago by Pascal1.4k

Good morning. So good news, with a higher coverage and using bowtie2 (the real 2 :-) ), it seems that indels are being detected well now with dindel. At least I have thousands of candidates in my variants file and I am waiting for dindel to filter those. Thanks to both of you for your help.

For my records, I wrote a simple memo there (it may help other people):

ADD REPLYlink written 5.6 years ago by Pascal1.4k
gravatar for Stefano Berri
5.6 years ago by
Stefano Berri3.9k
Cambridge, UK
Stefano Berri3.9k wrote:

I don't know what is the minimal coverage required by dindel, but with 1M reads 70bp paired, you have approx 2.2X coverage on chromosome 20. It might be that it is not enough to find indels, as it is very unlikely that the same indel is samples by two independent reads.

you should have approx 6000 indels, but half of them would be heterozygous...

try more reads (like 20M)

ADD COMMENTlink written 5.6 years ago by Stefano Berri3.9k

Thanks Stefano, I will come back to you tomorrow after a new try.

ADD REPLYlink written 5.6 years ago by Pascal1.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 656 users visited in the last hour