Question: Best Way To Get Truly Unique Reads In Bowtie/Sam?
5
gravatar for Philipp Bayer
7.5 years ago by
Philipp Bayer6.6k
Australia/Perth/UWA
Philipp Bayer6.6k wrote:

Hi,

I'm currently trying to get truly unique aligning paired-end reads in bowtie2, setting -k 1 doesn't help in this case as it just reports the first alignment for each read - but I don't want reads that align more than one time.

It looks like SAM's NH:1:X-flag is for this, where X is the number of times the read aligns - however, bowtie2 does not seem to set that flag (and I can't find a setting to convince bowtie2 to do so).

My current "solution" is to iterate through the sam/bam-file and discard all IDs that are listed more than two times (once for each element of the pair), however, that's a bit slow as I have to go through the file twice and I have bam-files in the order of several hundred gigabytes.

Is there a better solution?

Thanks!

sam bowtie • 13k views
ADD COMMENTlink written 7.5 years ago by Philipp Bayer6.6k
4
gravatar for Fidel
7.5 years ago by
Fidel1.9k
Germany
Fidel1.9k wrote:

Bowtie2 by default always maps multi-reads which is in-line with the recommendation from the authors (see http://www.nature.com/nrg/journal/v13/n1/full/nrg3117.html). The command line options modify how much effort will bowtie2 put into searching a best match or how many positions you want to get.

As stated elsewhere (see Bowtie2, -M Alignment/Reporting Mode) to get rid of multi-reads you have to look for the XS flag. This flag is only set if the read is a muli-read and contains the alignment score for second-best alignment.

ADD COMMENTlink modified 6.1 years ago by seidel7.0k • written 7.5 years ago by Fidel1.9k

Hi, I now ran into a problem:

I have these metrics in bowtie2 after a run with -X 500 -I 0 --no-discordant --no-unal --no-mixed

40949 reads; of these:
40949 (100.00%) were paired; of these:
16772 (40.96%) aligned concordantly 0 times
11759 (28.72%) aligned concordantly exactly 1 time
12418 (30.33%) aligned concordantly >1 times

I got 24177 paired alignments in the SAM-file, which equals the above number of unique and non-unique alignments.

When I check the SAM-file using less or grep, the XS-flag is not present! The metrics say that I got about 30% aligning more than once, but no XS-flag? Does "more than once" mean that the other alignments are worse? How come there's no XS-flag then, as these secondary alignments should have scores?

ADD REPLYlink modified 7.5 years ago • written 7.5 years ago by Philipp Bayer6.6k

Did you filter for read quality? If I remember right multi-reads get a mapping quality of 1

ADD REPLYlink written 7.5 years ago by Fidel1.9k

Thanks for the reply!

I tried filtering with samtools view -q 2, however the numbers don't match.

I checked the manuals and it seems that a mapping quality of 1 for duplicate reads happens only in bowtie1 - the closest there is in bowtie2 is this: "A mapping quality of 10 or less indicates that there is at least a 1 in 10 chance that the read truly originated elsewhere." - i.e. if I filter by less than 10 I should have a reasonably good indication of "uniqueness".

ADD REPLYlink written 7.5 years ago by Philipp Bayer6.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1145 users visited in the last hour