Entering edit mode
8 weeks ago
Agamemnon ▴ 80
I have 7.5million 25mer artificial reads, I mapped them against a reference genome using bowtie2.
The following flag was used
bowtie2 -k 2 --very-sensitive
7558491 reads; of these: 7558491 (100.00%) were unpaired; of these: 1399350 (18.51%) aligned 0 times 5200484 (68.80%) aligned exactly 1 time 958657 (12.68%) aligned >1 times 81.49% overall alignment rate
The are the results, I want to remove all the unmapped and multiple aligned reads. I understand, I have to take into account mapping quality, but as these are artificial reads, would I still use:
samtools view -F 4 -q 2 test.bam | wc -l
samtoolsdoes not know or care that the reads were artificial. It is going to carry out the operation you are asking it to do.
Hopefully you controlled for that already in your
That is obvious, but does not address the fact, of what the most appropriate way is to remove reads that map to multiple locations.
Mapping qualities are not handled the same way by different aligners. So with
bowtie2you could use a
MAPQ filter of >=40 to get reads which had only 1 convincing alignmentas noted in the linked blog post.
samtools view -F 4 -q 42 test.bam | wc -land I get
6159141, why is there a discrepancy, as originally only
When I did the calculations
6159141 - 5200484 = 958657, so I am still retaining the