Paired-end stranded RNA-seq...still confused about TopHat
2
0
Entering edit mode
8.6 years ago
cadeans ▴ 10

Hey Everyone,

I'm trying to figure out the appropriate TopHat settings for strand-specific, paired-end rna-seq data. I've read several posts about this but am still uncertain about the right settings. I'm hoping someone can confirm that my understanding of the sense/antisense, forward/reverse reads, is correct. As I understand it:

1) mRNA is an exact match to the DNA coding sequence (aside from U and no introns) and matches the sense strand
2) in library prep (TruSeq stranded in my case) the first strand of the cDNA library, which is antisense to the original gene, is used for sequencing, while the second strand is dUTP marked gets degraded
3) for paired end sequencing, after bridge PCR and sequencing the sense strand becomes read 1 (forward read) and the antisense strand becomes read 2 (reverse read).

So, when running TopHat, read 1 (/1) is used for the forward read and read 2 (/2) for the reverse read, with the library type set to fr=first strand?

My ultimate goal is to use the bam files from TopHat to get raw counts in HTseq, where I understand the appropriate library setting is "reverse". Btw, I'm doing all this in Galaxy, as I'm not very proficient at coding. Thanks in advance for any re-assurance.

RNA-Seq alignment • 6.2k views
ADD COMMENT
2
Entering edit mode
8.6 years ago

Using the words forward and reverse in this context introduce more confusion since THIS forward (sense direction) is not THAT forward (genomic forward). You should not use the words forward and reverse in this context just keep the sense/antisense terminology.

Due to to the vagaries of the library prep the reads in file 2 will correspond to the sense direction of the transcript (will indicate the 5' ends of the fragments that came from the transcript).

ADD COMMENT
1
Entering edit mode

Hi Istvan,

Thanks for the input. Regarding my use of forward and reverse, I am a little confused by their meaning. For my general understanding of the process, I tried to follow the sense and antisense strands through library prep and sequencing. I end up with the first sequenced read as matching the sense strand, but I think I might be misunderstanding something about the bridge PCR and sequencing, particularly because you say the sense transcripts should be present in the read 2 files. Either I'm missing something in the sequencing process or I don't understand how read 1 and read 2 are defined. Could you shed some light on this for me, please? Thanks in advance.

My reasoning:

first strand of cDNA is antisense (which is what remains after dUTP degradation) --> adapters are added to this strand and during bridge PCR the complement is created (sense strand) --> after further clustering all sense strands are washed away, leaving only antisense strands --> the sequencing process uses these antisense strands as a template to produce reads that are sense (and presumably these are the first reads in the fasta file (read 1; 1/)

ADD REPLY
0
Entering edit mode

I think this paper has a good explanation (though I can't check since I can't access it from here)

http://www.nature.com/nmeth/journal/v7/n9/abs/nmeth.1491.html

ADD REPLY
0
Entering edit mode

Hi cadeans,

Did you figure out why reads in file 2 corresponds to sense direction of the transcript? I have looked through the article but still don't get it ...

Here is my reasoning:

5' ----------- 3' mRNA fragment

   --> reads 1
3' ----------- 5' cDNA (first strand)
5' ----------- 3' cDNA (second strand that is reverse and complementary to the first strand)
           <-- read 2
ADD REPLY
1
Entering edit mode
8.6 years ago
Adrian Pelin ★ 2.6k

I think all of this is correct, except a minor point for (1), there is this thing called mRNA Editing (DNA Exons are different from mRNA), happens in some weirdo eukaryotic parasites but also in plants.

ADD COMMENT

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6