Read pair orientation : Illumina TruSeq Stranded mRNA library
1
14
Entering edit mode
5.0 years ago
erwan.scaon ▴ 910

Given strand-specific 2x150 paired reads generated with Illumina Truseq Stranded mRNA, chose the right value for featureCounts "strandSpecific" option (1=stranded vs 2=reversely stranded).

Below is a compilation of helpful posts on the subject. They made me pick strandSpecific = 2.
I am under the impression that everyone is using slightly differents words / terms to define the same thing, thus I am still a bit confused and I'd be glad to hear some feedback.

### Posts recap ###

Post 1 (2015/03/20) :

some protocols, like Illumina Truseq Stranded mRNA library prep kit, first sequence the blue end (which is the right-most read, which is from the first strand, which is complementary to mRNA), then sequence the red end (which is the left-most read, which is from the second strand, which is the same as mRNA). Therefore, you need to use “fr-firststrand” in the Tuxedo suite, and use “-s reverse” option if you use htseq-count.

Which I understand as:

  --R2-->
5'--------second_strand--------3'
3'---------first_strand--------5'
<--R1--


Post 2 (~2015/09) :

OP : in library prep (TruSeq stranded in my case)

Answer : Due to to the vagaries of the library prep the reads in file 2 will correspond to the sense direction of the transcript (will indicate the 5' ends of the fragments that came from the transcript).

Post 3 (~2015/10) :

OP : I've got Illumina sequencing reads that generated by the TruSeq Stranded mRNA Sample Prep Kit.

Answer : For TruSeq kits (well all dUTP kits), read #2 dictates the strand.

Post 4 (~2016/12) :

If you're using Illumina's TruSeq stranded protocol, the first read of each pair should contain the sequence of the template strand, i.e., it is in the opposite direction of the coding sequence. Thus, you should set strandSpecific=2 in featureCounts.

Post 5 (~2017/02) :

OP : Featurecounts of Rsubread package has two options for stranded libraries: 1 (stranded) and 2 (reversely stranded)

Answer : fr-firststrand = reversely stranded (the R2 read is in the same direction than the RNA fragment)

### Interpretation ###

Post 1 defines Illumina Truseq Stranded mRNA as fr-firststrand & post 5 state that fr-firststrand = reversely stranded.
=> featureCounts strandSpecific = 2.

Post 2 : R2 sense direction of the transcript
Post 3 : R2 dictates the strand.
Post 4 : R2 in the same direction as the coding sequence.
=> R2 correspond to second strand synthesized, which correspond to original RNA fragment. Sounds like featureCounts strandSpecific = 2.

Edit (2017/11/26) : Additional info found in "Mapping RNA-seq Reads with STAR" (Curr Protoc Bioinformatics - 2016) :

For the protocols in which the 1 st read is on the opposite strand to the RNA molecule (such as Illumina stranded Tru-Seq), .str1. corresponds to the (−) strand and .str2. corresponds to the (+) strand.

featureCounts TruSeq strandSpecific • 14k views
10
Entering edit mode
5.0 years ago
igor 13k

You are right. Every tool and kit has a different definition of strandness and it's difficult to remember them. Luckily, there are only two options. Check this comparison table to help make some sense out of it: https://github.com/igordot/genomics/blob/master/notes/rna-seq-strand.md

Since I deal with a lot of kits and that information may not be easily accessible, I process RNA-seq data using both stand options every time. Based on the results, it should be obvious which is the correct strand (usually over 90% of the reads are on one strand).