Question: Multimapping reads intervention in SNP calling
gravatar for sambioinfo2018
9 months ago by
sambioinfo20180 wrote:

Hi All, Hi,

I have used bowtie2 (default settings) to align the reads to the reference genome. Due to presence of homoeologous genes in plant genome, there is a possibility of getting lots of multimapping reads and this might or will hinder while calculating SNPs. So i tried to remove the secondary alignments and unmapped reads using the samtools flag 260 thus retaining all the primary alignment reads. Doing this i believe that for a given position on the genome all the primary reads for that position is retained and the secondary alignment reads are removed along with unmapped reads. I have also filtered reads based on mapping quality > 20 before calling snps.

On calculating SNPs using samtools and bcftools with and without removing the secondary alignment, i get more or less the same number of SNPs. It was quite shocking to me that the number of SNPs did not decrease on removing the secondary aligned reads. Or is it that during SNPs calling multimapping reads are not included and the ones identified are significant. The command i used for calling snps is as follows: samtools mpileup -u -g -f reference.fasta sequence.bam | bcftools call -v -m -O z -o vcffile.vcf.gz

I wanted to identify the significant SNPs and to do that I have to make sure that these multimapped reads are not bringing in many spurious SNPs.

Thanks in advance.

mpileup snp samtools alignment • 392 views
ADD COMMENTlink modified 7 months ago by Biostar ♦♦ 20 • written 9 months ago by sambioinfo20180

To my knowledge mpileup only uses primary alignments. That makes sense because it calls small nucleotide variants, rather than structural variants, where the secondary and chimeric alignments are used to infer the breakpoints. Therefore, you can skip all these filtering steps. mpileup allows to set a minimal MAPQ (option -q). Setting this > 0 (I typically use 20 because this is what is used by other (structural) variant callers such as pindel or lumpy) will discard the multimappers. Also note that samtools mpileup is deprecated and has moved to bcftools mpileup. When calling variants, you might also consider switching to BWA mem, which many variant callers assume as the alignment software. For SNPs and short indels it probably does not matter too much, but as soon as you want to call structural variants (larger indels, duplications, tandems...) you will need an aligner that supports split read alignment, which bowtie2 does not.

ADD REPLYlink modified 9 months ago • written 9 months ago by ATpoint23k

Thanks ATpoint for your detailed reply. I indeed also provided mapping quality and base quality using samtools mpileup. My objective is to call SNPs so in that case bowtie should be sufficient as you mentioned. One more clarification. I have the sequence data from 5 different lane for the same sample. I have merged the bam files for the same sample from different lanes after sorting, duplicate removal. Then called for SNP. How could I verify if the alignment for the five lanes are identical? I understand that it should be the same. But still is there a quick way to verify it or can i call VCF for the five different lanes and compare them each other. This is to make sure everything is fine with alignment.

ADD REPLYlink written 9 months ago by sambioinfo20180
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 900 users visited in the last hour