Forum: Orientation of PE reads a review of --fr --ff and --rf meanings
15
gravatar for Juke-34
15 months ago by
Juke-343.3k
Sweden
Juke-343.3k wrote:

Last update of the drawing 2019/08 (Cf. GUESSmyLT):

enter link description here

Original post: I tried to review the meanings of the different RNA-seq library types because it's often confusing and hard to understand/know what they are despite their importance in downstream analyses.

I would be glad to get some comment/correction/criticims and complementary information, to make the resource the most exhaustive as possible. Given the numerous question about RNA-seq library type I found on internet I'm sure this small contribution could be useful not only for me but for a broad audience.

Here are nice resources amongst others I found by googling:

https://bioinformatics.uconn.edu/reference-based-rna-seq-data-analysis/#

https://github.com/igordot/genomics/blob/master/notes/rna-seq-strand.md

https://chipster.csc.fi/manual/library-type-summary.html

https://galaxyproject.org/tutorials/rb_rnaseq/

https://dbrg77.wordpress.com/2015/03/20/library-type-option-in-the-tuxedo-suite/

http://onetipperday.sterding.com/2012/07/how-to-tell-which-library-type-to-use.html

https://sailfish.readthedocs.io/en/master/library_type.html

https://rnaseq.uoregon.edu

http://seqanswers.com/forums/showthread.php?t=6317

I summarised the result in these figures:

enter image description here enter image description here enter image description here enter image description here

And here is the few information I found about which technologies produces them:

--fr orientation are produced using the Illumina paired end protocol.

--fr-firststrand dUTP, NSR, NNSR

--fr-secondstrand Ligation, Starndard SOLiD

--rf orientation are produced using the Illumina mate-pair protocol.

--ff orientation are produced in using the SOLiD mate-pair protocol. It also the case for Roche 454 paired-end libraries (these are called paired-end, but are based on the same principles as the mate-pair libraries)

Extra information not necessarily obvious for everybody: f mean forward and r reverse. Consequently --fr means forward reverse and --rf means reverse forward.

Trinity doesn't use the same referrntiel (it uses the DNA) so RF corresponds to fr-firststrand and FR to fr-secondstrand.

rna-seq forum • 1.6k views
ADD COMMENTlink modified 5 months ago by Sammy10 • written 15 months ago by Juke-343.3k
2

@igor has a nice summary here: https://github.com/igordot/genomics/blob/master/notes/rna-seq-strand.md

ADD REPLYlink written 15 months ago by genomax76k

that too is a good start, just does not explain what one sees in say IGV

ADD REPLYlink written 15 months ago by Istvan Albert ♦♦ 82k

Wondering why these schemes (Igor's) end with a PCR reaction. If you do a PCR before loading the sequenced into a massive sequencer, you lose the stranded information, so I believe these are sort of confusing

In addition.. I believe Igor's schemes are not right. In the dUTP method it ends with a fragment orange at the left and blue at the right, when it actually reproduces the same original sequence

ADD REPLYlink modified 15 months ago • written 15 months ago by Antonio R. Franco4.3k

I am happy that you took on this, but in my opinion, the explanation is still a bit too complicated. Too many things going on at once.

I would explain this from the point of view of the transcript alone. Your first image is the DNA, that complicates the concept in my opinion. The second problem is that you are also discussing it as a paired-end protocol, this also adds to the complexity.

  1. I would recommend starting with a single-end transcript oriented explanation.
  2. Then when that is done do the paired-end,
  3. Then discuss how the resulting reads map when we align
ADD REPLYlink modified 15 months ago • written 15 months ago by Istvan Albert ♦♦ 82k

The DNA complicates the concept, but in the same time it allows avoiding to add extra explanation about how the resulting reads map when we align because this information (maybe not obvious) is already present.

ADD REPLYlink written 15 months ago by Juke-343.3k

This is a nice summary. Here are a few suggestions:

  • This is not really a "question", it could be more appropriate to flag this post as "forum".
  • You should export your powerpoint as image instead of using snapshots because we can see the auto-spelling-mistake-detection with the red waves under the text, which is annoying (at least to me ;) ).
  • I don't know how relevant is the "AAAAAA" at the 3' end of the fragments since, usually, there is a fragmentation step before the first-strand synthesis.
ADD REPLYlink written 15 months ago by Carlo Yague4.8k

The AAAAAA is relevant because of the fragmentation, right? If there was no fragmentation you'd never hit into that region (based on short reads).

But you bring up a great point - the explanations above appear to suggest that all reads will map to the start of the transcript. But because of fragmentation, this is not how it works.

It is hard to explain this properly actually - and I am happy to see this effort to clarify the terminology.

ADD REPLYlink modified 15 months ago • written 15 months ago by Istvan Albert ♦♦ 82k

Yes, what I actually meant is that you will not (usually) reverse-transcribe the full mRNA from the 5' end to the 3' end. So sometimes you will have the poly-A in your fragment, but not always. This is much clearer in the revision now.

ADD REPLYlink written 15 months ago by Carlo Yague4.8k
5
gravatar for Juke-34
15 months ago by
Juke-343.3k
Sweden
Juke-343.3k wrote:

Here a first revision of the initial drawing integrating some of your remarks:

enter image description here

I'm sceptical about the FF library... I'm not sure if I'm right. Both reads are not supposed to be on the same strand ? I don't find a clear information.

And with the R/RF library I'm not sure the reads are drawn at the correct extremity...

ADD COMMENTlink modified 15 months ago • written 15 months ago by Juke-343.3k
0
gravatar for Sammy
5 months ago by
Sammy10
Sammy10 wrote:

Here is my interpretation of the library types and relative orientation of the reads. I hope that the image is self-explanatory. FF library is tricky. IMPORTANT NOTE: I know that rr is not technically correct and is basically the same thing as ff. However it helped to make my point clearer

Library types and the relative orientation of the reads (focused on TopHat and HISAT2)

Not sure if I am right. Trying to find answers. What do you think?

We got ff really different there. I know. I found it strange to be on the same strand but it does not really make sense otherwise.

Picture Update: Correction: *the quality might decline towards the 3' end Library Types and Relative Position of the Reads (blue star)

ADD COMMENTlink modified 5 months ago • written 5 months ago by Sammy10
1

It’s right, from what I understand the reads of ff library are on the same strand. I forgot to update the last update of my drawing. You can find it here: https://github.com/NBISweden/GUESSmyLT
I have found the answer in this paper (look at figure2): Berglund EC1, Kiialainen A, Syvänen AC. Next-generation sequencing technologies and applications for human genetic history and forensics. Investig Genet. 2011.

ADD REPLYlink modified 5 months ago • written 5 months ago by Juke-343.3k

I posted an update of the picture. I added the sequencing arrows.

I have one question. I noticed that the library types from rf explanation are different in our drawings. Your rf-firststrand (I used reverse (from HISAT) same thing) has the read that corresponds to the second strand (original RNA direction) as mate 1. I drew it the other way around. I'm looking for ways to check which one is right.

ADD REPLYlink modified 5 months ago • written 5 months ago by Sammy10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1798 users visited in the last hour