Forum:Orientation of PE reads a review of --fr --ff and --rf meanings
5
19
Entering edit mode
2.5 years ago
Juke34 ★ 5.4k

Last update of the drawing 2019/08 (Cf. GUESSmyLT):

Original post: I tried to review the meanings of the different RNA-seq library types because it's often confusing and hard to understand/know what they are despite their importance in downstream analyses.

I would be glad to get some comment/correction/criticims and complementary information, to make the resource the most exhaustive as possible. Given the numerous question about RNA-seq library type I found on internet I'm sure this small contribution could be useful not only for me but for a broad audience.

Here are nice resources amongst others I found by googling:

https://bioinformatics.uconn.edu/reference-based-rna-seq-data-analysis/#

https://github.com/igordot/genomics/blob/master/notes/rna-seq-strand.md

https://chipster.csc.fi/manual/library-type-summary.html

https://galaxyproject.org/tutorials/rb_rnaseq/

https://dbrg77.wordpress.com/2015/03/20/library-type-option-in-the-tuxedo-suite/

http://onetipperday.sterding.com/2012/07/how-to-tell-which-library-type-to-use.html

https://rnaseq.uoregon.edu

I summarised the result in these figures:

And here is the few information I found about which technologies produces them:

--fr orientation are produced using the Illumina paired end protocol.

--fr-firststrand dUTP, NSR, NNSR

--fr-secondstrand Ligation, Starndard SOLiD

--rf orientation are produced using the Illumina mate-pair protocol.

--ff orientation are produced in using the SOLiD mate-pair protocol. It also the case for Roche 454 paired-end libraries (these are called paired-end, but are based on the same principles as the mate-pair libraries)

Extra information not necessarily obvious for everybody: f mean forward and r reverse. Consequently --fr means forward reverse and --rf means reverse forward.

Trinity doesn't use the same referrntiel (it uses the DNA) so RF corresponds to fr-firststrand and FR to fr-secondstrand.

RNA-Seq Forum • 3.7k views
3
Entering edit mode
0
Entering edit mode

that too is a good start, just does not explain what one sees in say IGV

0
Entering edit mode

Wondering why these schemes (Igor's) end with a PCR reaction. If you do a PCR before loading the sequenced into a massive sequencer, you lose the stranded information, so I believe these are sort of confusing

In addition.. I believe Igor's schemes are not right. In the dUTP method it ends with a fragment orange at the left and blue at the right, when it actually reproduces the same original sequence

0
Entering edit mode

I am happy that you took on this, but in my opinion, the explanation is still a bit too complicated. Too many things going on at once.

I would explain this from the point of view of the transcript alone. Your first image is the DNA, that complicates the concept in my opinion. The second problem is that you are also discussing it as a paired-end protocol, this also adds to the complexity.

1. I would recommend starting with a single-end transcript oriented explanation.
2. Then when that is done do the paired-end,
3. Then discuss how the resulting reads map when we align
0
Entering edit mode

The DNA complicates the concept, but in the same time it allows avoiding to add extra explanation about how the resulting reads map when we align because this information (maybe not obvious) is already present.

0
Entering edit mode

This is a nice summary. Here are a few suggestions:

• This is not really a "question", it could be more appropriate to flag this post as "forum".
• You should export your powerpoint as image instead of using snapshots because we can see the auto-spelling-mistake-detection with the red waves under the text, which is annoying (at least to me ;) ).
• I don't know how relevant is the "AAAAAA" at the 3' end of the fragments since, usually, there is a fragmentation step before the first-strand synthesis.
0
Entering edit mode

The AAAAAA is relevant because of the fragmentation, right? If there was no fragmentation you'd never hit into that region (based on short reads).

But you bring up a great point - the explanations above appear to suggest that all reads will map to the start of the transcript. But because of fragmentation, this is not how it works.

It is hard to explain this properly actually - and I am happy to see this effort to clarify the terminology.

0
Entering edit mode

Yes, what I actually meant is that you will not (usually) reverse-transcribe the full mRNA from the 5' end to the 3' end. So sometimes you will have the poly-A in your fragment, but not always. This is much clearer in the revision now.

5
Entering edit mode
2.5 years ago
Juke34 ★ 5.4k

Here a first revision of the initial drawing integrating some of your remarks:

I'm sceptical about the FF library... I'm not sure if I'm right. Both reads are not supposed to be on the same strand ? I don't find a clear information.

And with the R/RF library I'm not sure the reads are drawn at the correct extremity...

0
Entering edit mode
20 months ago
Sammy ▴ 10

Here is my interpretation of the library types and relative orientation of the reads. I hope that the image is self-explanatory. FF library is tricky. IMPORTANT NOTE: I know that rr is not technically correct and is basically the same thing as ff. However it helped to make my point clearer

Not sure if I am right. Trying to find answers. What do you think?

We got ff really different there. I know. I found it strange to be on the same strand but it does not really make sense otherwise.

Picture Update: Correction: *the quality might decline towards the 3' end

1
Entering edit mode

It’s right, from what I understand the reads of ff library are on the same strand. I forgot to update the last update of my drawing. You can find it here: https://github.com/NBISweden/GUESSmyLT
I have found the answer in this paper (look at figure2): Berglund EC1, Kiialainen A, Syvänen AC. Next-generation sequencing technologies and applications for human genetic history and forensics. Investig Genet. 2011.

0
Entering edit mode

I posted an update of the picture. I added the sequencing arrows.

I have one question. I noticed that the library types from rf explanation are different in our drawings. Your rf-firststrand (I used reverse (from HISAT) same thing) has the read that corresponds to the second strand (original RNA direction) as mate 1. I drew it the other way around. I'm looking for ways to check which one is right.