Question: Does Tophat Use The Library-Type Information For Mapping, Or Just For The Xs Flag?
gravatar for gaelgarcia05
7.8 years ago by
gaelgarcia05220 wrote:

When I specify library-type to TopHat, i.e., first-strand, second-strand, unstranded, TopHat appends a value + or - to the XS:A flag, which is useful for subsequent analyses, such as annotation.

However, does this information actually influence the "mappability" of reads, or is this unaffected?

My thinking is that the information would be considered for mapping reads to the GTF file if supplied with -G .

In that dataset, read pairs should be concordant with transcript strand. i,e., if -library-type first-strand was indicated, and transcript A is at coords. X to Y, on the + strand, MATE 1 of a pair should map to the reverse-complement of the 3' end of TRANSCRIPT A, and MATE 2 of the pair should map to the 5' end, in the same strand as the transcript sequence.

However, if no GTF is supplied with -G, or in the subsequent stage of mapping reads that didn't map to the transcriptome, now to the whole genome, then TopHat should make no use of library-type information, right?


TopHat will treat the reads as strand specific. Every read alignment will have an XS attribute tag. Consider supplying library type options below to select the correct RNA-seq protocol.

From TH Manual:

Since the splice junction finding algorithm of TopHat makes use of library-type information (if provided), one of the two TopHat runs would result in many more splice junctions than the other one. You can then use the library type that gives more junctions. If this is not the case TopHat might not work well with your sequencing protocol. Please let us know more details about your protocol so we can add support for new library types.

So this indicates that the strandedness argument does influence the mapping algorithm. But, HOW does TopHat use library-type information for its splice junction finding algoritm, if it has to be unbiased regarding on which strand actual transcripts exist?

rnaseq reads tophat mapping rna-seq • 4.6k views
ADD COMMENTlink modified 7.5 years ago by Kanne440 • written 7.8 years ago by gaelgarcia05220
gravatar for Ashutosh Pandey
7.8 years ago by
Ashutosh Pandey12k wrote:

Specifying the correct library type will ensure that the paired reads are mapped correctly and should increase the mappability (if you meant the ability to align the reads)

ADD COMMENTlink written 7.8 years ago by Ashutosh Pandey12k

Thanks, ashutosmits! Yes, that is what I meant. However, I can't see the way this would influence the ability to align reads. Does it have to do with the GTF file supplied in case of selecting -G? Otherwise, I don't see how TopHat would make assumptions about what should map to the + or - strand....

ADD REPLYlink written 7.8 years ago by gaelgarcia05220
gravatar for Kanne
7.5 years ago by
Kanne440 wrote:

I understand your confusion and this this thought just occurred to me. I haven't spent very long thinking about it so maybe I'm forgetting something but here's a suggestion anyway:

Tophat is a spliced read aligner. If you do not supply -G then it will still align reads over splicing junctions, it will just figure out the slicing junctions de novo. Tophat uses the canonical donor/acceptor sequences when it defines splice sites. Hence, if you specify that your libraries are strand-specific, it would make sense for tophat to only look for the canonical donor/acceptor sequences in the read which represents the RNA transcript, and the reverse complement of the canonical donor/acceptor sites in the other read, and to ignore any canonical splice sequences on the biologically irrelevant strand. If you specify that your library is not strand-specific, it will need to look for the donor/acceptor site and it's reverse complement in both reads, since it can't be sure which strand the transcript originated from... If my thought is correct, if you have a strand-specific library and specified it as such, you would end up with less opportunity for identification of false positive junctions, and presumably a faster run time too.

ADD COMMENTlink written 7.5 years ago by Kanne440
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1376 users visited in the last hour