Weird results in TopHat2 paired-end alignment
0
0
Entering edit mode
3.5 years ago
Aspire ▴ 300

I have aligned paired-end reads with TopHat2. In the resulting BAM file, there are reads that do map as "read map in proper pair" (their flags "include" the flag 2) but map on different chromosomes (!).

I have called TopHat2 with parameters --mate-inner-dist = -139, --mate-std-dev = 50. Unless I misunderstand something about the definitions of the terms, could it be that a negative mate-inner-dist messed something up?

I think that a read "mapped in proper pair" is the same as "concordant alignment". The definition of the latter is:

A pair that aligns with the expected relative mate orientation and with the expected range of distances between mates is said to align "concordantly".

These are two reads out of the mapped file :

A01056:33:HF3NFDSXY:1:2516:13657:30718  435     1       91387362        0       117M    21      8218147 0       CCTGTGGTAACTTTTCTGACACCTCCTGCTTAAAACCCAAAAGGTCAGAAGGATCGTGAGGCCCCGCTTTCACGGTCTGTATTCGTACTGAAAATCAAGATCAAGCGAGCTTTTGCC   :FF:F:FFFF:FFFFFFFFFFFFFF:FF,FFF,FFFFFF:FFF:FFFFF:FF:FF:FFFFFFF:FFFFFFFFF:FFFFFFFFF,FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFF   AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:117        YT:Z:UU NH:i:20 CC:Z:=  CP:i:91387362   XS:A:-  HI:i:2


A01056:33:HF3NFDSXY:1:2516:13657:30718  371     21      8218147 0       112M    1       91387362        0       GGGCAAAAGCTCGCTTGATCTTGATTTTCAGTACGAATACAGACCGTGAAAGCGGGGCCTCACGATCCTTCTGACCTTTTGGGTTTTAAGCAGGAGGTGTCAGAAAAGTTAC        :F:FFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF,FFFFFFF:FFFF::FFFFFFFFFF:FFFFFFFFFFFFFFFFFFF        AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:112        YT:Z:UU NH:i:20 CC:Z:GL000220.1 CP:i:161594     XS:A:+  HI:i:2

And this is the command used to generate the alignment

tophat --mate-inner-dist -139 --mate-std-dev 50 -o align/Sample10 -G /.../Homo_sapiens/Ensembl/GRCh38/Annotation/Genes/genes.gtf -N 10 --read-gap-length 5 --read-edit-dist 15 --segment-length 20 --read-realign-edit-dist 3 --no-coverage-search --library-type fr-firststrand -p 32 /.../Homo_sapiens/Ensembl/GRCh38/Sequence/Bowtie2Index/genome processed/Sample10_R1_clean_pe.fastq.gz processed/Sample10_R2_clean_pe.fastq.gz,processed/Sample10_R1_clean_se.fastq.gz,processed/Sample10_R2_clean_se.fastq.gz

( _pe files are for paired reads. _se files were also sequenced paired end; but during the pre-processing cleaning part, only one of the pair of reads remained)

RNA-Seq TopHat2 paired-end • 1.1k views
ADD COMMENT
0
Entering edit mode

Thanks, but nevertheless I'd still be glad if anyone can help me with this issue.

ADD REPLY
0
Entering edit mode

In the majority of the cases for this file however, the error is not the sam file flag, but the YT:Z:UU flag.

In this run, tophat has received both PE and SR reads. About 98% were PE. Despite that (subsampling the file), about 98% are mapped with the YT:Z:UU flag.

This Is There An Explanation For This Tophat "Yt" Descriptor Discrepancy In My Sam Output?

was on a similar topic.

ADD REPLY

Login before adding your answer.

Traffic: 2759 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6