Is There An Explanation For This Tophat "Yt" Descriptor Discrepancy In My Sam Output?
0
1
Entering edit mode
9.2 years ago
Dan D 7.3k

I was given some TCGA BAM files and asked to perform a realignment with some specific requirements. While perusing the results of an alignment in IGV I noticed something strange. As far as I can tell, everything in the read data pop-up dialogs tells me that I'm looking at paired-end reads that mapped as pairs, except for the YT tag which is always UU.

The read names in a mapped pair are 100% identical and pulled from separate FASTQ files. I'm seeing this with every read I check, and I've spot checked reads from random places on five different chromosomes.

Here's the tophat v2.0.9 command that I ran:

/usr/local/bin/tophat --output-dir /data/deedee/rnaseq/efb596b4 --max-multihits 2 -p 4 --b2-very-sensitive --library-type fr-unstranded /data/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome efb596b4_R1.fastq efb596b4_R2.fastq


Does anyone have any ideas about what's going on here? More background follows in case it's useful:

In the initial BAM file, the read names were a mess. They had /1 and /2 attached to the end of the read names, sometimes twice. I wrote a script to remove these /1 and /2 values from the ends of the read names. I used bedtools bamtofastq to convert these query-sorted, cleaned BAM files to a pair of FASTQ files. From there I ran the tophat command above.

igv sam tophat bowtie • 3.2k views
1
Entering edit mode

I wonder if this is an artifact of how the reads are aligned. Since the pairs are aligned separately, in part at least, I wonder if tophat just doesn't reset this auxiliary tag.

0
Entering edit mode

Very interesting suggestion. I'm going to pursue this further and see what I can find out. Thanks!

0
Entering edit mode

Please report back if that turns out to be the case (or not). I'd like to know as well!

0
Entering edit mode

I checked some output generated by a colleague and I'm seeing the same thing in those data as well. I bet your suggestion is correct. I went ahead and posted on the Tuxedo Tools message board to see if they can confirm.