Question: How to extract uniquely mapped reads in Bowtie2
gravatar for Ankit
22 months ago by
Ankit140 wrote:

Hi everyone, I am looking for a conclusive way to obtain uniquely mapped reads from Bowtie2. I checked the manual of Bowtie2, which shows higher mapping quality is equal to more unique. I applied the quality 10 as a threshold and extracted reads above 10 MAPQ by samtools. (samtools view -q 10).

Basically, the alignment stats are:

 67870428 (100.00%) were unpaired; of these:
25785662 (37.99%) aligned 0 times
27964580 (41.20%) aligned exactly 1 time
14120186 (20.80%) aligned >1 times

62.01% overall alignment rate = 42084766 reads.

As the general definition of uniquely mapped reads, the alignment should be exactly one time and I tried to remove 14120186 (20.80%) aligned >1 times these reads. However -q 10 does not remove all the reads (Kept in total 31850072)). So I increased -q to 20 . It kept 31564096, I also set -q 42 (Note max quality in the output sam I found was 42), which given me 26154743 reads, which is lesser than reads aligned exactly 1 time. I also checked online forums, and few people suggested to remove XS:i: tag, So I used this command as suggested here:

grep -E "@|NM:" bt2output.sam | grep -v "XS:" > uniq_bt2output.sam

It has given me 27964580 exactly what I wanted.

However, I am not sure if it is the correct way as I am removing the second alignment.

The issue is which value to set to extract unique reads or is there any specific tag inside sam which can allow me to fetch only uniquely mapped reads. Is there any specific option during alignment (like Bowtie -m 1, I didn't find equivalent in Bowtie2)?

I am not clear about the right or appropriate way.

I would appreciate any help. Thanks you Ankit

ADD COMMENTlink modified 14 months ago by jiaobingke20 • written 22 months ago by Ankit140


samtools view -F 2304 <input.bam>

relevant samflags:

not primary alignment (0x100)
supplementary alignment (0x800)
ADD REPLYlink written 22 months ago by cpad011214k

Not working. It outputs a SAM file. No change in the number of reads in output. Same as input.

ADD REPLYlink modified 22 months ago • written 22 months ago by Ankit140

It will be helpful if someone can suggest me this.

ADD REPLYlink written 22 months ago by Ankit140
gravatar for jiaobingke
14 months ago by
jiaobingke20 wrote:

The document of bowtie2 says that higher mapping quality means more unique alignment. I used bowtie2 to align my 2B-RAD sequence ( a method of simplified whole genome sequence. Read length is 27 bp) to genome sequence.

If a read can align to A position of genome without any mismatch, and also align to B position of genome with one mismatch. The bowtie2 will regard this read as multiple hits read. So a "XS" tag will be added to its aligment record in the sam file. But the mapping quality is higher than 10, say 32 in my file.

If a read can align to A, B, C positions of genome all without any mismatch, the bowtie2 will definitely regard this read as multiple hits read. And a "XS" tag will be added. But the mapping quality of these record are often lower than 4.

Above two situations will be both regarded as multiple hits. I want to only discard records in the second situation. When designating --no-unal option in bowtie2, only one of best aligment will be printed out. Then I filter out records with mapping quality large than 10( or 3). In this way, records in the first situation will be retained and records in the second situation will be filter out.

ADD COMMENTlink written 14 months ago by jiaobingke20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1329 users visited in the last hour