How to extract uniquely mapped reads in Bowtie2
1
1
Entering edit mode
2.4 years ago
Ankit ▴ 180

Hi everyone, I am looking for a conclusive way to obtain uniquely mapped reads from Bowtie2. I checked the manual of Bowtie2, which shows higher mapping quality is equal to more unique. I applied the quality 10 as a threshold and extracted reads above 10 MAPQ by samtools. (samtools view -q 10).

Basically, the alignment stats are:

 67870428 (100.00%) were unpaired; of these:
25785662 (37.99%) aligned 0 times
27964580 (41.20%) aligned exactly 1 time
14120186 (20.80%) aligned >1 times


62.01% overall alignment rate = 42084766 reads.

As the general definition of uniquely mapped reads, the alignment should be exactly one time and I tried to remove 14120186 (20.80%) aligned >1 times these reads. However -q 10 does not remove all the reads (Kept in total 31850072)). So I increased -q to 20 . It kept 31564096, I also set -q 42 (Note max quality in the output sam I found was 42), which given me 26154743 reads, which is lesser than reads aligned exactly 1 time. I also checked online forums, and few people suggested to remove XS:i: tag, So I used this command as suggested here:

grep -E "@|NM:" bt2output.sam | grep -v "XS:" > uniq_bt2output.sam

It has given me 27964580 exactly what I wanted.

However, I am not sure if it is the correct way as I am removing the second alignment.

The issue is which value to set to extract unique reads or is there any specific tag inside sam which can allow me to fetch only uniquely mapped reads. Is there any specific option during alignment (like Bowtie -m 1, I didn't find equivalent in Bowtie2)?

I am not clear about the right or appropriate way.

I would appreciate any help. Thanks you Ankit

bowtie2 Bowtie2 alignment uniquely mapped reads • 2.5k views
1
Entering edit mode

Try:

samtools view -F 2304 <input.bam>


relevant samflags:

not primary alignment (0x100)
supplementary alignment (0x800)

0
Entering edit mode

Not working. It outputs a SAM file. No change in the number of reads in output. Same as input.

0
Entering edit mode

It will be helpful if someone can suggest me this.

0
Entering edit mode
21 months ago
jiaobingke ▴ 20

The document of bowtie2 says that higher mapping quality means more unique alignment. I used bowtie2 to align my 2B-RAD sequence ( a method of simplified whole genome sequence. Read length is 27 bp) to genome sequence.

If a read can align to A position of genome without any mismatch, and also align to B position of genome with one mismatch. The bowtie2 will regard this read as multiple hits read. So a "XS" tag will be added to its aligment record in the sam file. But the mapping quality is higher than 10, say 32 in my file.

If a read can align to A, B, C positions of genome all without any mismatch, the bowtie2 will definitely regard this read as multiple hits read. And a "XS" tag will be added. But the mapping quality of these record are often lower than 4.

Above two situations will be both regarded as multiple hits. I want to only discard records in the second situation. When designating --no-unal option in bowtie2, only one of best aligment will be printed out. Then I filter out records with mapping quality large than 10( or 3). In this way, records in the first situation will be retained and records in the second situation will be filter out.