Unaligned Reads From Bowtie2
1
0
Entering edit mode
4 months ago
tiziana • 0

Dear all, I am using bowtie2 to filter paired reads that do not map against a host genome. I use bowtie2 for mapping against the host sequence and then samtools to filter required unmapped reads. I have one question: in the bowtie output log i read:

If 13’843’083 pairs aligned 0 times concordantly or discordantly, how come some reads still aligned 1 (3’192’732) or >1 (11’569’420) times?

Thank you for any help in understanding this.

bowtie2 • 713 views
ADD COMMENT
1
Entering edit mode
4 months ago
d-cameron ★ 2.8k

There are 27,686,166 reads in 13,843,083 read pairs. Bowtie is giving a summary of the individual read alignments.

For example, if read 1 and read 2 of a fragment both don't align anywhere, that's +2 to the "aligned 0 times" category. If read 1 aligned once and read 2 had multiple alignment candidates that'd be +1 to the "aligned exactly 1 time" and +1 to the "aligned >1 times" categories.

ADD COMMENT
0
Entering edit mode

Thank you! that makes sense. But from my understanding, those 27,686,166 reads (13,843,083 read pairs) are part of the pairs that aligned 0 times concordantly or discordantly. how come some reads still align?

ADD REPLY
0
Entering edit mode

It means in 13.8M fragments (27.7M reads), the two reads from the pair didn't match up in terms of alignment (i.e. what is expected based on library generation protocol).

The section then breaks down those pairs of reads into single reads instead of pairs. The 3 million are reads that aligned uniquely and the 11 million are reads that were multi-mappers.

ADD REPLY
0
Entering edit mode

thank you for your answer. sorry if mine is a stupid question, as i am quite new to this, what does it mean "what is expected based on library generation protocol"? I thought the number 13’843’083 pairs (27’686’166 mates) that are defined as being aligned 0 times concordantly or discordantly, was a number generated by bowtie2 analysis? if they are aligned 0 times, how come some single reads still align?

ADD REPLY
0
Entering edit mode

You have pairs of reads... they should align next to each other if you made a perfect library.

If they don't you can look at each pair of mates and put them together in boxes of concordant, discordant, and other. The Other category includes mates that align to different chromosomes or mates where one of the reads doesn't align at all.

ADD REPLY
0
Entering edit mode

If you're doing standard Illumina sequencing, then you expect R1 and R2 align on opposite strands, pointing towards each other (ie first sequenced based is at the extremity of the alignment) approximate ~400bp apart from each other (this distance is determined by the library fragment size). Since one read is on the forward strand and the other on the reverse this is known as "FR" orientation. The sequence reads from one end of the DNA fragment, flips it over, then reads from the other end like so:

R1>>>>>         <<<<<<R2


There are some less common library preparation protocols that result in different expected read orientations (RF or FF). Bowtie2 is counting the reads that don't align as expected.

ADD REPLY
0
Entering edit mode

thank you again, benformatics and d-cameron. my library comes from a sort of "environmental" sample (eg. mammal blood) where i am trying to remove the host genome to find what else - bacteria, viruses, etc. - is found in it. does this make a difference in how the reads should align? specifically, in the pairs that align 0 times discordantly to the host genome (eg. dog genome), what are the single reads (eg. 3’192’732) still aligning to?

ADD REPLY

Login before adding your answer.

Traffic: 2610 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6