I have done two viral runs on the same sample with a total of 490224 reads. Since these are viral and extracted from cell culture I know they are contaminated with the host (chicken). So I mapped to the host and pulled all the reads from the sam file with 0x4 flag.
At first I had the current version 4 gallus gallus genome, 18s rRNA, and 28s rRNA fasta's as separate files and then mapped and extracted the unmapped reads from them each in series. Which left me with 159396 reads that didn't match to those files.
Then I thought, why do that three times, concatenated the files into "chicken_genome.fasta" indexed it and re ran the host removal but this time I got 172728 reads leftover.
I do not suppose anyone could help me figure out why I got an extra 13,332 reads the second time around?