Hi All, Hi,
I have used bowtie2 (default settings) to align the reads to the reference genome. Due to presence of homoeologous genes in plant genome, there is a possibility of getting lots of multimapping reads and this might or will hinder while calculating SNPs. So i tried to remove the secondary alignments and unmapped reads using the samtools flag 260 thus retaining all the primary alignment reads. Doing this i believe that for a given position on the genome all the primary reads for that position is retained and the secondary alignment reads are removed along with unmapped reads. I have also filtered reads based on mapping quality > 20 before calling snps.
On calculating SNPs using samtools and bcftools with and without removing the secondary alignment, i get more or less the same number of SNPs. It was quite shocking to me that the number of SNPs did not decrease on removing the secondary aligned reads. Or is it that during SNPs calling multimapping reads are not included and the ones identified are significant. The command i used for calling snps is as follows: samtools mpileup -u -g -f reference.fasta sequence.bam | bcftools call -v -m -O z -o vcffile.vcf.gz
I wanted to identify the significant SNPs and to do that I have to make sure that these multimapped reads are not bringing in many spurious SNPs.
Thanks in advance.