Question: Is There An Explanation For This Tophat "Yt" Descriptor Discrepancy In My Sam Output?
gravatar for Dan D
3.4 years ago by
Dan D5.9k
Dan D5.9k wrote:

I was given some TCGA BAM files and asked to perform a realignment with some specific requirements. While perusing the results of an alignment in IGV I noticed something strange. As far as I can tell, everything in the read data pop-up dialogs tells me that I'm looking at paired-end reads that mapped as pairs, except for the YT tag which is always UU.

enter image description here

The read names in a mapped pair are 100% identical and pulled from separate FASTQ files. I'm seeing this with every read I check, and I've spot checked reads from random places on five different chromosomes.

Here's the tophat v2.0.9 command that I ran:

/usr/local/bin/tophat --output-dir /data/deedee/rnaseq/efb596b4 --max-multihits 2 -p 4 --b2-very-sensitive --library-type fr-unstranded /data/iGenomes/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/genome efb596b4_R1.fastq efb596b4_R2.fastq

Does anyone have any ideas about what's going on here? More background follows in case it's useful:

In the initial BAM file, the read names were a mess. They had /1 and /2 attached to the end of the read names, sometimes twice. I wrote a script to remove these /1 and /2 values from the ends of the read names. I used bedtools bamtofastq to convert these query-sorted, cleaned BAM files to a pair of FASTQ files. From there I ran the tophat command above.

tophat sam bowtie igv • 1.7k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Dan D5.9k

I wonder if this is an artifact of how the reads are aligned. Since the pairs are aligned separately, in part at least, I wonder if tophat just doesn't reset this auxiliary tag.

ADD REPLYlink written 3.4 years ago by Devon Ryan62k

Very interesting suggestion. I'm going to pursue this further and see what I can find out. Thanks!

ADD REPLYlink written 3.4 years ago by Dan D5.9k

Please report back if that turns out to be the case (or not). I'd like to know as well!

ADD REPLYlink written 3.4 years ago by Devon Ryan62k

I checked some output generated by a colleague and I'm seeing the same thing in those data as well. I bet your suggestion is correct. I went ahead and posted on the Tuxedo Tools message board to see if they can confirm.

ADD REPLYlink written 3.4 years ago by Dan D5.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 585 users visited in the last hour