Tophat: No "Unmapped Mate" (Flag 0X8) Information Available For Reads In Unmapped.Bam
3
0
Entering edit mode
11.1 years ago
Chris • 0

Hi,

I work with paired-end Illumina data that was mapped using Tophat 2.0.6.

When both reads of a pair are unmapped (flag 0x4 set and read available in unmapped.bam), neither of the reads has the "mate unmapped" (0x8) flag set.

Here's an example:

$ samtools view unmapped.bam | grep HWI-ST587_0093:1:1202:11833:4852.ATCACGA

HWI-ST587_0093:1:1202:11833:4852.ATCACGA 69 * 0 255 * * 0 0 GCCCGAGGTTATCTAGAGTCACCAAAGCCGCCGGCGCCCGCCCCCCCGGCCGCGGCGGGGGGGGGGGGGGGAGGGGGGGGCTGTGGTGATGGAAGAAGGG @@@DDDDD<DHHFIDGBGCFHIFIAHFGGDHBHIHABE?,3>@',80&&)&)&&&&)&&))-7<&&)&5>&&&05&&&&&&&)+((++(+(+(((((((&

HWI-ST587_0093:1:1202:11833:4852.ATCACGA 133 * 0 255 * * 0 0 GGGGGGATGCGTGCATTTATCAGATCAAAACCAACCCGGTCAGCCCCTCTCCGGCCCCCGCCCGGGGGGGGGGGGCCGGCGGCGGCGTGGCGGGTCGGCA @B@FFB;@B?BBDDDDD@CBCCDCC?CCCCDDBDDDDDBBDD<CCDDBDDDC8B7BBB&amp;&amp;&amp;5&amp;0055@&amp;595-&amp;)&amp;0&amp;&amp;&amp;&amp;0&amp;&amp;&amp;&amp;)&amp;)&amp;&amp;&amp;&amp;))0&amp;&amp;&amp;&amp;`< p="">

$ samtools view -f 0x8 unmapped.bam

$

This looks like a bug to me. Can anyone confirm this?

Cheers,

Chris

tophat2 rna-seq bam • 3.9k views
ADD COMMENT
0
Entering edit mode

I think 77 and 141 are more appropriate for reads like this. I am not sure though why they are using 69 and 133 flags.

ADD REPLY
0
Entering edit mode

Yes, that's exactly what I'm wondering about. Anyway, I've sent them a bug report.

ADD REPLY
0
Entering edit mode
11.1 years ago

EDIT NM

Shouldn't your read names be unique?

HWI-ST587_0093:1:1202:11833:4852.ATCACGA = 
HWI-ST587_0093:1:1202:11833:4852.ATCACGA

I would expect that you should have some sort pair information?

Wiki example:

@HWUSI-EAS100R:6:73:941:1973#0/1
/1    the member of a pair, /1 or /2 (paired-end or mate-pair reads only

Explantation of flags:

flag 69:

64:  the read is the first read in a pair (hex: 0x0040)
4:   the query sequence itself is unmapped (hex: 0x0004)
1:   the read is paired in sequencing (hex: 0x0001)

flag 133:

128:  the read is the second read in a pair (hex: 0x0080)
4:    the query sequence itself is unmapped (hex: 0x0004)
1:    the read is paired in sequencing (hex: 0x0001)
ADD COMMENT
1
Entering edit mode

I think the read names are trimmed during the alignment and in bam files you only see the reads names without "1" or "2" attached to them in the end. The flags 64 and 128 (read is the first read in a pair or second read in a pair) can be used to determine if it was a read 1 or read 2.

ADD REPLY
0
Entering edit mode

Yep that makes sense.

ADD REPLY
0
Entering edit mode
11.1 years ago
Chris • 0

Hi Zev,

it's true, my original unmapped.bam had reads with /1 and /2 suffixes. I removed them for downstream processing. In this case I want to look at unmapped reads that have a mapped mate. If I encounter a read whose mate is unmapped, I would like to just skip it. That's how I noticed the missing 0x8 flags.

However, please note that Tophat 2.0.7 does not add /1 and /2 suffixes for unmapped reads anymore.

ADD COMMENT
0
Entering edit mode
11.1 years ago

Hi Chris,

I think those reads (where mate has been mapped and read itself is not mapped) should be part of the mappped.bam file and not the unmapped.bam file. Only those read pairs where both of the reads are unmapped should go to unmapped.bam.

Normally aligners will keep (store in mapped bam file) both the reads from a pair even if one is mapped. Downstream tools like Pindel will make use of such reads (one mate mapped and the other didnt) to look for insertion and deletion by split read method.

ADD COMMENT

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6