5.7 years ago by
If you knew the samples were mixed, you could have aligned to a combination of mouse and human reference. That's the best solution; if a read matches mouse perfectly, and human with 2 mismatches, you want it to align to mouse, not to be forced to align to human.
I'd start by doing a simple count of each sequence represented in the unmapped reads. Something like
samtools view unmapped.bam | cut -f 10 | sort | uniq -c | sort -nr > sorted_reads.txt
Will make you a list of each read, and how often it turns up. Obviously it will separate reads that differ from each other by an error, which is not ideal, but this is at least a place to start. If 90% of the reads are the exact same sequence, you want to know that before you start blasting each one individually.
You could also try de novo assembly on the unmapped reads. It won't do much if your reads are scattered across a mammalian genome, but if they are something else, like mouse mitochondrial genes, it could help.
But before you make a program to BLAST each one, just spot-check a few. 80-90% aligned seems alright to me, you might just be looking at noise.