I am pretty new to bioinformatics and to this forum, and would very much appreciate any suggestions you could offer to help me with ways to filter my alignments (obtained using bwa mem) so I only have uniquely mapped reads, other than using q scores (which I have already done and doesn't solve my problem)? Here is a bit more explanation:
I am trying to obtain uniquely mapped reads from my bwa mem alignments to use in downstream analyses. I realize that "uniquely mapped reads" is a loaded term, and that most sources I have found on the topic suggest filtering by q score should do the trick. My problem is that I have tried filtering by progressively higher q scores and I still have a (very small) number of reads that appear to not be uniquely mapped. Because I am aligning my reads from each sample to 96 different loci (in bwa), I want to make sure my reads are only mapped to one locus.
For the one sample I am using to test out commands, I filtered my data so I only have properly paired reads and mapped reads (in the case of my unpaired reads) and also filtered by q score (I get nearly the same results at q of 10, 20, or 30, so it's not an issue of changing the q score level), merged all my bam files, and obtained 193,998 reads total for that sample. When I open the bam file in Geneious and count the number of reads for each locus they add up to 194, 050. I realize the difference between these two numbers is an incredibly small number of reads, and I am going to only use loci that have ~20 or 30 mapped reads for each sample (haven't yet decided on that threshold) so maybe this doesn't matter terribly much in the grand scheme, but I still would like to have these two numbers match so I am not incorrectly including loci that weren't actually sequenced for a particular sample.
I have also seen suggestions to filter by XT tags, but none of my files have this tag. I DO have the XS tag (most seem to be XS:i:0). My understanding of what XS:i:0 means is a bit shaky, but I think it means that the secondary alignment has a score of zero so these would be reads that only have one alignment? And maybe this could be a good tag to use for filtering?
So my question is this: Does anyone have any suggestions for other tags or filtering steps I can use for my data so that my reads only map to one location, other than filtering for q score (which I have already done and will do in addition to any other filtering steps)?
Here is the applicable portion of code that I am running on alignments obtained from bwa (sorted them by coordinate and converted them to bam already in a previous step). The first line is what I'm using on my paired-end file and the second line is what I am using on my unpaired read 1 file (using the same code for my unpaired read 2 file). Note that I used trimmomatic to remove low quality bases from my reads, which is why I have three different read files for each sample.
samtools view -q 10 -f 0x02 -b pe001_sorted.bam > pe001_sorted_properlypaired.bam samtools view -q 10 -F 0x04 -b unpaired1_sorted.bam > unpaired1_sorted_mapped.bam
Thank you very much for any help you can offer!