I'm currently trying to get truly unique aligning paired-end reads in bowtie2, setting
-k 1 doesn't help in this case as it just reports the first alignment for each read - but I don't want reads that align more than one time.
It looks like SAM's
NH:1:X-flag is for this, where X is the number of times the read aligns - however, bowtie2 does not seem to set that flag (and I can't find a setting to convince bowtie2 to do so).
My current "solution" is to iterate through the sam/bam-file and discard all IDs that are listed more than two times (once for each element of the pair), however, that's a bit slow as I have to go through the file twice and I have bam-files in the order of several hundred gigabytes.
Is there a better solution?
Hi, I now ran into a problem:
I have these metrics in bowtie2 after a run with
-X 500 -I 0 --no-discordant --no-unal --no-mixed
40949 reads; of these:
40949 (100.00%) were paired; of these:
16772 (40.96%) aligned concordantly 0 times
11759 (28.72%) aligned concordantly exactly 1 time
12418 (30.33%) aligned concordantly >1 times
I got 24177 paired alignments in the SAM-file, which equals the above number of unique and non-unique alignments.
When I check the SAM-file using less or grep, the XS-flag is not present! The metrics say that I got about 30% aligning more than once, but no XS-flag? Does "more than once" mean that the other alignments are worse? How come there's no XS-flag then, as these secondary alignments should have scores?
Did you filter for read quality? If I remember right multi-reads get a mapping quality of 1
Thanks for the reply!
I tried filtering with samtools view -q 2, however the numbers don't match.
I checked the manuals and it seems that a mapping quality of 1 for duplicate reads happens only in bowtie1 - the closest there is in bowtie2 is this: "A mapping quality of 10 or less indicates that there is at least a 1 in 10 chance that the read truly originated elsewhere." - i.e. if I filter by less than 10 I should have a reasonably good indication of "uniqueness".