I have a question that seems simple, but can't find an answer to anywhere: is there a way to make tophat, when aligning reads to a transcriptome and then genome, report the reads that are aligning to the transcriptome with transcript coordinates (in addition to genomic coordinates or on its own)?
I see no options in the TopHat manual that allow you to force it to report alignments to transcripts in terms of transcript coordinates, or even just name which transcript it's aligning to in a tag or something. The option below implies that it does an alignment to the transcripts-- i.e. it does exactly what I want, it just then takes the additional step of converting back to genomic coordinates and doesn't tell me what transcript the read is coming from.
-T/--transcriptome-only Only align the reads to the transcriptome and report only those mappings as genomic mappings.
This would make several subsequent processing steps easier with my data. I have short reads (<=30 or so bp) being reported as genomic alignments, and have to go back and try and figure out what transcript they're coming from; it's fine when there's only one annotation in a region, but when there are overlapping annotations it becomes much more difficult. The transcripts ENST00000264933 and ENST00000316418 from Ensembl (GRCh37.p13) partially overlap, and I have a read which TopHat correctly splices in favor of ENST00000264933.4's pattern based on 2 nucleotides at the end of the read; however, because the read is reported as a genomic alignment and the first 18+ nucleotides are coming from a genomic region that is annotated as part of both transcripts, it is difficult for me to build a parser to decide which transcript it should be assigned to. TopHat's splicing pattern tells me that it already made the decision, it just didn't report it in a way that I can easily understand. Is there a way to have TopHat report the coordinates (or even just the ID) of the transcript it is aligning the read to, either in place of or in addition to genomic coordinates? (Note that I cannot use just the transcriptome in place of a genome, because some of the reads in the same are from unspliced mRNA and need the genome sequence in order to be aligned properly, unless I am misunderstanding something.)
Thanks very much.