RNA-seq Multiple Map with Multiple primary alignment in Tophat2 alignment ?
1
0
Entering edit mode
8.8 years ago
heyao • 0

I am new to use spliced-aligner like tophat2, I used Tophat2 v2.0.13 to do alignment on paired-end 50bp data. The command I used is straightforward

tophat2 -o ./MCF-7.debug.tophat2 -G $GTF -p 16 --fusion-search --fusion-min-dist 200000 ${bowtie2_index} ${fastqfiles}

However, when I am trying to get the unique and primary alignment from the result, I found something weird.

1. I found that a multiple alignment with two primary alignment . It seems contradictory because there should be only one primary alignment in a multiple alignment as sam format specification says:

Multiple mapping: The correct placement of a read may be ambiguous, e.g. due to repeats. In this case, there may be multiple read alignments for the same read. One of these alignments is considered primary.

See below:

samtools view -f 64 MCF-7.debug.tophat2.bam | grep --color "HWI-EAS418:2:98:923:1852"
---------------------------------------------------------
HWI-EAS418:2:98:923:1852    339    chr13    60240952    3    29M    =    60240894    -79557    CCCTTCTGTCAACTGCACTTTCTGATTTT    @7;>BBB@ABBACBAB>ABABBBBCABBB    AS:i:0    XM:i:0    XO:i:0    XG:i:0    MD:Z:50    NM:i:0    XF:Z:1 chr13-chr17 60240952 29M60320430F21M CCCTTCTGTCAACTGCACTTTCTGATTTTCAGGCAGTGTCTTCCATAAAT @7;>BBB@ABBACBAB>ABABBBBCABBBCC@BCBBBABCBBBCCCCCCB    XP:Z:chr13 60240894 50M    NH:i:2    CC:Z:=    CP:i:60240952    HI:i:0
HWI-EAS418:2:98:923:1852    83    chr13    60240952    3    29M    =    60240894    -50    CCCTTCTGTCAACTGCACTTTCTGATTTT    @7;>BBB@ABBACBAB>ABABBBBCABBB    AS:i:0    XM:i:0    XO:i:0    XG:i:0    MD:Z:50    NM:i:0    XF:Z:1 chr13-chr17 60240952 29M58007535F21m CCCTTCTGTCAACTGCACTTTCTGATTTTCAGGCAGTGTCTTCCATAAAT @7;>BBB@ABBACBAB>ABABBBBCABBBCC@BCBBBABCBBBCCCCCCB    XP:Z:chr13 60240894 50M    NH:i:2    HI:i:1
HWI-EAS418:2:98:923:1852    83    chr17    58007515    3    21M    chr13    60240894    -50    ATTTATGGAAGACACTGCCTG    BCCCCCCBBBCBABBBCB@CC    AS:i:0    XM:i:0    XO:i:0    XG:i:0    MD:Z:50    NM:i:0    XF:Z:2 chr13-chr17 60240952 29M58007535F21m CCCTTCTGTCAACTGCACTTTCTGATTTTCAGGCAGTGTCTTCCATAAAT @7;>BBB@ABBACBAB>ABABBBBCABBBCC@BCBBBABCBBBCCCCCCB    XP:Z:chr13 60240894 50    NH:i:2    HI:i:1
HWI-EAS418:2:98:923:1852    339    chr17    60320430    3    21M    chr13    60240894    -79557    CAGGCAGTGTCTTCCATAAAT    CC@BCBBBABCBBBCCCCCCB    AS:i:0    XM:i:0    XO:i:0    XG:i:0    MD:Z:50    NM:i:0    XF:Z:2 chr13-chr17 60240952 29M60320430F21M CCCTTCTGTCAACTGCACTTTCTGATTTTCAGGCAGTGTCTTCCATAAAT @7;>BBB@ABBACBAB>ABABBBBCABBBCC@BCBBBABCBBBCCCCCCB    XP:Z:chr13 60240894 50    NH:i:2    CC:Z:=    CP:i:60240952    HI:i:0
----------------------------------------------------------------

So how to evaluate these alignments?

2. The NH tag Meaning?

I found NH:i:2 are set in these alignments showed above, it seems that it is the number of PRIMARY alignment . But when I check the second in the pair, the NH tag is set to NH:i:2 when there are only ONE PRIMARY alignment, see as below:

samtools view -F 64 MCF-7.debug.tophat2.bam | grep --color "HWI-EAS418:2:98:923:1852"

------------------------------------

HWI-EAS418:2:98:923:1852    419    chr13    60240894    3    50M    =    60240952    79557    GCTCCTTGGCGACTGGAGTCCTTGTTGAGTTGCAATTGATATTGTAGTGT    BCCBBBBBBBB?>B?B?B@BBAAB>BB?B@BBB>@?BB@B?BBB=8@9B>    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:50    YT:Z:UU    XP:Z:chr13-chr17 60240952 29M60320429F21M    NH:i:2    CC:Z:=    CP:i:60240894    HI:i:0
HWI-EAS418:2:98:923:1852    163    chr13    60240894    3    50M    =    60240952    50    GCTCCTTGGCGACTGGAGTCCTTGTTGAGTTGCAATTGATATTGTAGTGT    BCCBBBBBBBB?>B?B?B@BBAAB>BB?B@BBB>@?BB@B?BBB=8@9B>    AS:i:0    XN:i:0    XM:i:0    XO:i:0    XG:i:0    NM:i:0    MD:Z:50    YT:Z:UU    XP:Z:chr13-chr17 60240952 29M58007534F21m    NH:i:2    HI:i:1

----------------------------------------------------------

Thanks in advance

RNA-Seq tophat2 alignment • 2.7k views
ADD COMMENT
0
Entering edit mode
8.8 years ago
  1. That appears to be a bug in tophat2, the alignments to chr17 don't make sense given the more reasonable ones to chr13. You might rerun this dataset with version 2.1.0 and see if this bug has been fixed.
  2. The NH tag should be the number of alignments reported, the multiple primary alignments you're seeing is a bug. BTW, tophat2 only reports a handful of MAPQ values and you can use those to determine if a read/pair has multiple alignments or not (a MAPQ of 50 is used when there's only one alignment, 3 when there are 2 alignments, and so on).
ADD COMMENT
0
Entering edit mode

This is not a bug, it appears in other aligner such as gsnap. see that:

HWI-EAS418:2:98:923:1852    83    chr17    60320430    96    29H21M    chr13    60240894    -108    CAGGCAGTGTCTTCCATAAAT    CC@BCBBBABCBBBCCCCCCB    XH:Z:CCCTTCTGTCAACTGCACTTTCTGATTTT    MD:Z:21    NH:i:2    HI:i:1    NM:i:0    SM:i:96    XQ:i:40    X2:i:0    XO:Z:CT    XS:A:-    XT:Z:GT-AG,2.00,2.00,-chr17@60320430..-chr13@60240952
HWI-EAS418:2:98:923:1852    83    chr13    60240952    96    29M21H    =    60240894    -108    CCCTTCTGTCAACTGCACTTTCTGATTTT    @7;>BBB@ABBACBAB>ABABBBBCABBB    XH:Z:CAGGCAGTGTCTTCCATAAAT    MD:Z:29    NH:i:2    HI:i:1    NM:i:0    SM:i:96    XQ:i:40    X2:i:0    XO:Z:CT    XS:A:-    XT:Z:GT-AG,2.00,2.00,-chr17@60320430..-chr13@60240952
HWI-EAS418:2:98:923:1852    323    chr17    58007515    0    21M29H    chr13    60240894    108    ATTTATGGAAGACACTGCCTG    BCCCCCCBBBCBABBBCB@CC    XH:Z:AAAATCAGAAAGTGCAGTTGACAGAAGGG    MD:Z:21    NH:i:2    HI:i:2    NM:i:0    SM:i:0    XQ:i:0    X2:i:0    XO:Z:CT    XS:A:+    XT:Z:GT-AG,2.00,2.00,+chr17@58007515..-chr13@60240952
HWI-EAS418:2:98:923:1852    339    chr13    60240952    0    29M21H    =    60240894    -108    CCCTTCTGTCAACTGCACTTTCTGATTTT    @7;>BBB@ABBACBAB>ABABBBBCABBB    XH:Z:CAGGCAGTGTCTTCCATAAAT    MD:Z:29    NH:i:2    HI:i:2    NM:i:0    SM:i:0    XQ:i:0    X2:i:0    XO:Z:CT    XS:A:-    XT:Z:GT-AG,2.00,2.00,+chr17@58007515..-chr13@60240952
ADD REPLY
0
Entering edit mode

Ah, you appear to have duplicate read names associated with different sequences (or you have paired end reads and are running things incorrectly).

ADD REPLY

Login before adding your answer.

Traffic: 1653 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6