Best Way To Get Truly Unique Reads In Bowtie/Sam?
1
5
Entering edit mode
9.5 years ago

Hi,

I'm currently trying to get truly unique aligning paired-end reads in bowtie2, setting -k 1 doesn't help in this case as it just reports the first alignment for each read - but I don't want reads that align more than one time.

It looks like SAM's NH:1:X-flag is for this, where X is the number of times the read aligns - however, bowtie2 does not seem to set that flag (and I can't find a setting to convince bowtie2 to do so).

My current "solution" is to iterate through the sam/bam-file and discard all IDs that are listed more than two times (once for each element of the pair), however, that's a bit slow as I have to go through the file twice and I have bam-files in the order of several hundred gigabytes.

Is there a better solution?

Thanks!

bowtie sam • 14k views
4
Entering edit mode
9.5 years ago
Fidel ★ 2.0k

Bowtie2 by default always maps multi-reads which is in-line with the recommendation from the authors (see http://www.nature.com/nrg/journal/v13/n1/full/nrg3117.html). The command line options modify how much effort will bowtie2 put into searching a best match or how many positions you want to get.

As stated elsewhere (see Bowtie2, -M Alignment/Reporting Mode) to get rid of multi-reads you have to look for the XS flag. This flag is only set if the read is a muli-read and contains the alignment score for second-best alignment.

0
Entering edit mode

Hi, I now ran into a problem:

I have these metrics in bowtie2 after a run with -X 500 -I 0 --no-discordant --no-unal --no-mixed

 40949 reads; of these: 40949 (100.00%) were paired; of these: 16772 (40.96%) aligned concordantly 0 times 11759 (28.72%) aligned concordantly exactly 1 time 12418 (30.33%) aligned concordantly >1 times 

I got 24177 paired alignments in the SAM-file, which equals the above number of unique and non-unique alignments.

When I check the SAM-file using less or grep, the XS-flag is not present! The metrics say that I got about 30% aligning more than once, but no XS-flag? Does "more than once" mean that the other alignments are worse? How come there's no XS-flag then, as these secondary alignments should have scores?

0
Entering edit mode

Did you filter for read quality? If I remember right multi-reads get a mapping quality of 1

0
Entering edit mode

I tried filtering with samtools view -q 2, however the numbers don't match.

I checked the manuals and it seems that a mapping quality of 1 for duplicate reads happens only in bowtie1 - the closest there is in bowtie2 is this: "A mapping quality of 10 or less indicates that there is at least a 1 in 10 chance that the read truly originated elsewhere." - i.e. if I filter by less than 10 I should have a reasonably good indication of "uniqueness".