Structural Variation Detection On A Short Reads Simulation
2
1
Entering edit mode
10.5 years ago
Pascal ★ 1.5k

Hi,

I am trying to do the following exercise (still playing with indels):

1. simulate short reads (and indels) out of a human genome chromosome,
2. align the short reads simulated in (1),
3. call indels using a SV detection tool,
4. compare results in (3) with the indels generated in (1).

For this workflow, I'm using wgsim in (1), bowtie2 in (2) and dindel in (3). I've written at the end of this question the complete commands list.

So far I haven't been able to detect a single indel and I am wondering what I am doing wrong. Is it because I haven't simulated enough reads (1M reads)? Or something else? I am wondering if I misunderstood some concept or it is simply a matter of adjusting a parameter.

I tried to make the simulator to extend the indels (with -X parameter) but it didn't help.

Thanks in advance for any hint!

samtools faidx human_g1k_v37.fasta 20 > human_g1k_v37_chr20.fasta
bowtie-build human_g1k_v37_chr20.fasta homo_chr20
samtools view -bS homo_chr20.sam > homo_chr20.bam
./dindel_x86-64  --ref human_g1k_v37_chr20.fasta --outputFile 1 --bamFile homo_chr20.bam --analysis getCIGARindels


=> no variants detected here (in 2.variants file) so I don't go ahead :-(

structural indel bowtie • 3.2k views
0
Entering edit mode

Pascal, as Wolf noted bowtie(1) cannot be used here. to verify this, check the CIGAR strings in your SAM output if you can find a single "I" or "D" there.

4
Entering edit mode
10.5 years ago
Wolf ▴ 130

I may be missing something very basic here, but are you sure you are actually using bowtie2 and not bowtie1? the executables are bowtie2 and bowtie2-align and i think bowtie2 takes it's index with the -x switch.

0
Entering edit mode

Thanks Wolf ! You're definetly right: I'm actually using bowtie1 here :-O it's because I clicked the lastest version link and I got the lastest release from bowtie1 where bowtie2 is beta still... I think it won't help to fix my problem (I'm correctly using bowtie1) but you're right pointing it out!

0
Entering edit mode

but bowtie1 does not do gapped alignment, so there won't be any reads spanning your indels

0
Entering edit mode

Very good point Wolf! No gapped alignments, no indels, actually your comment contains the correct answer to the question. Pascal, try to replace bowtie with bwa, mosaik, bfast.

0
Entering edit mode

No, I think I will first use bowtie2. This was my initial objective as I read that "Bowtie 2 supports gapped alignment with affine gap penalties. Number of gaps and gap lengths are not restricted, except by way of the configurable scoring scheme. Bowtie 1 finds just ungapped alignments." Can you confirm this is what we are looking for or should I definitely use BWA? In any case I'm confused because I intended to use at first bowtie2 not 1 :-( Wolf, your're the greatest, thank you !!

0
Entering edit mode

Good morning. So good news, with a higher coverage and using bowtie2 (the real 2 :-) ), it seems that indels are being detected well now with dindel. At least I have thousands of candidates in my variants file and I am waiting for dindel to filter those. Thanks to both of you for your help.

For my records, I wrote a simple memo there (it may help other people):

http://biotechies.blogspot.com/2011/11/genome-structural-variation-on.html

3
Entering edit mode
10.5 years ago

I don't know what is the minimal coverage required by dindel, but with 1M reads 70bp paired, you have approx 2.2X coverage on chromosome 20. It might be that it is not enough to find indels, as it is very unlikely that the same indel is samples by two independent reads.

you should have approx 6000 indels, but half of them would be heterozygous...

0
Entering edit mode

Thanks Stefano, I will come back to you tomorrow after a new try.