Is first strand the sense or the antisense?
1
0
Entering edit mode
3.4 years ago
Jay-Run ▴ 30

Hello!

I'm totally confused with stranded and non-stranded library prep protocols. I have an RNA-seq dataset, and the library preps are stranded (first strand). I ran infer experiment to see if the first strand means sense or antisense. I know that it should be antisense. Over 90% of tags are antisense, so I can say that the first strand is actually the antisense. I just wanna make sure if I'm correct before going further with my analysis.

Thanks for your kind guidance.

rna-seq • 2.8k views
ADD COMMENT
1
Entering edit mode

The exact definition of strandness will vary for different tools: Read pair orientation : Illumina TruSeq Stranded mRNA library

ADD REPLY
0
Entering edit mode

https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1876-7 https://artbio.github.io/springday/strandness/

I totally understand the difference between stranded and non-stranded protocols. It's just the assignment of first or second stand to sense or antisense is a bit confusing for me. And based on what I gathered from the above-mentioned links, the first strand must be antisense. Just want some kind of confirmation. :)

ADD REPLY
1
Entering edit mode

...the first strand must be antisense.

Not necessarily. There's a Biostar post that discusses strand conventions that you should check out: Question: Forward And Reverse Strand Conventions.

ADD REPLY
0
Entering edit mode

Thank you. I will check it out.

ADD REPLY
1
Entering edit mode
3.4 years ago

The summary is that it depends on the sequencing instrument and on the protocol. Most common protocols (Illumina Trueseq) do end up producing reads that match the antisense of how to feature is annotated in say the genomic build.

For example, as you noted, all single end reads that originate from a transcript that happens to be annotated on the forward strand will match on the reverse strand (reverse complemented).

But this property is solely determined by the sequencing protocol and is not an inherent property of RNA-Seq data in general.

ADD COMMENT
0
Entering edit mode

Thank you, Albert.

I know what you mean, and the thing is I totally understand how the stranded or non-stranded libraries are generated, still a bit confused when I want to define the parameters for my analysis. In the multiQC of Infer Experiment, it's clear that when approximately equal numbers of reads are aligned to the sense and antisense strands then we have an non-stranded library. Now what if 90% of the reads are aligned to the antisense strand? And the protocol says that the first strand was used for library prep? Am I making any sense?

Given this information I still don't know for sure how to specify the strand information (Forward (FR) or Reverse (RF)) for the mapping step.

In this experiment, nKAPA Stranded mRNA-Seq Kit (KR0960 – v5.17, KAPABiosystems) was used for library construction.

ADD REPLY
1
Entering edit mode

I think you need to digest one thing at a time.

It looks like the kit you mentioned utilizes dUTP method, which then I think would make R2 the original RNA strand in your case.

infer_experiment.py from RSeQC (which is what MultiQc uses) depends on the orientation of the reads. I would check out the documentation here, specifically where the authors describe how there are "there are two different ways to strand reads".

ADD REPLY
0
Entering edit mode

The the scan information, SC019_core_rEXT009.180913.HiSeq4000.FCB.lane4.gcap_17_11.R1, implies R1 and the the column library strand is first strand. In the stranded library prep, the first strand which is made based on the original mRNA is kept and the second strand is digested.

Thank you. I will read the documentation.

ADD REPLY
1
Entering edit mode

I'm not sure if this will be helpful with what you're trying to do, but you can first map your reads using STAR with --readFilesIn R1.fastq.gz R2.fastq.gz. Then you can play around with -s option in featureCounts from Subread package to determine whether R1 or R2 is forward based on the gene counts.

ADD REPLY
0
Entering edit mode

Thank you for providing me with a solution.

Unfortunately, I cannot work on the server to convert RNA-seq reads to counts, because it's too heavy and I'm afraid it will crush the server. That's why I decided to use Galaxy to get my count matrix and then do the rest of the analysis in R. So, if I can do something like this in Galaxy (I'm gonna use HISAT2 instead of STAR), I'll try it. I'm now at the trimming stage. But from what I read, I can tell that R1 is reverse.

ADD REPLY

Login before adding your answer.

Traffic: 2013 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6