Question: Is first strand the sense or the antisense?
0
gravatar for Jay-Run
9 weeks ago by
Jay-Run10
Jay-Run10 wrote:

Hello!

I'm totally confused with stranded and non-stranded library prep protocols. I have an RNA-seq dataset, and the library preps are stranded (first strand). I ran infer experiment to see if the first strand means sense or antisense. I know that it should be antisense. Over 90% of tags are antisense, so I can say that the first strand is actually the antisense. I just wanna make sure if I'm correct before going further with my analysis.

Thanks for your kind guidance.

rna-seq • 278 views
ADD COMMENTlink modified 9 weeks ago by Istvan Albert ♦♦ 86k • written 9 weeks ago by Jay-Run10
1

The exact definition of strandness will vary for different tools: Read pair orientation : Illumina TruSeq Stranded mRNA library

ADD REPLYlink written 9 weeks ago by igor12k

https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1876-7 https://artbio.github.io/springday/strandness/

I totally understand the difference between stranded and non-stranded protocols. It's just the assignment of first or second stand to sense or antisense is a bit confusing for me. And based on what I gathered from the above-mentioned links, the first strand must be antisense. Just want some kind of confirmation. :)

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Jay-Run10
1

...the first strand must be antisense.

Not necessarily. There's a Biostar post that discusses strand conventions that you should check out: Question: Forward And Reverse Strand Conventions.

ADD REPLYlink written 9 weeks ago by newbio17320

Thank you. I will check it out.

ADD REPLYlink written 9 weeks ago by Jay-Run10
1
gravatar for Istvan Albert
9 weeks ago by
Istvan Albert ♦♦ 86k
University Park, USA
Istvan Albert ♦♦ 86k wrote:

The summary is that it depends on the sequencing instrument and on the protocol. Most common protocols (Illumina Trueseq) do end up producing reads that match the antisense of how to feature is annotated in say the genomic build.

For example, as you noted, all single end reads that originate from a transcript that happens to be annotated on the forward strand will match on the reverse strand (reverse complemented).

But this property is solely determined by the sequencing protocol and is not an inherent property of RNA-Seq data in general.

ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by Istvan Albert ♦♦ 86k

Thank you, Albert.

I know what you mean, and the thing is I totally understand how the stranded or non-stranded libraries are generated, still a bit confused when I want to define the parameters for my analysis. In the multiQC of Infer Experiment, it's clear that when approximately equal numbers of reads are aligned to the sense and antisense strands then we have an non-stranded library. Now what if 90% of the reads are aligned to the antisense strand? And the protocol says that the first strand was used for library prep? Am I making any sense?

Given this information I still don't know for sure how to specify the strand information (Forward (FR) or Reverse (RF)) for the mapping step.

In this experiment, nKAPA Stranded mRNA-Seq Kit (KR0960 – v5.17, KAPABiosystems) was used for library construction.

ADD REPLYlink written 9 weeks ago by Jay-Run10
1

I think you need to digest one thing at a time.

It looks like the kit you mentioned utilizes dUTP method, which then I think would make R2 the original RNA strand in your case.

infer_experiment.py from RSeQC (which is what MultiQc uses) depends on the orientation of the reads. I would check out the documentation here, specifically where the authors describe how there are "there are two different ways to strand reads".

ADD REPLYlink written 9 weeks ago by newbio17320

The the scan information, SC019_core_rEXT009.180913.HiSeq4000.FCB.lane4.gcap_17_11.R1, implies R1 and the the column library strand is first strand. In the stranded library prep, the first strand which is made based on the original mRNA is kept and the second strand is digested.

Thank you. I will read the documentation.

ADD REPLYlink written 9 weeks ago by Jay-Run10
1

I'm not sure if this will be helpful with what you're trying to do, but you can first map your reads using STAR with --readFilesIn R1.fastq.gz R2.fastq.gz. Then you can play around with -s option in featureCounts from Subread package to determine whether R1 or R2 is forward based on the gene counts.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by newbio17320

Thank you for providing me with a solution.

Unfortunately, I cannot work on the server to convert RNA-seq reads to counts, because it's too heavy and I'm afraid it will crush the server. That's why I decided to use Galaxy to get my count matrix and then do the rest of the analysis in R. So, if I can do something like this in Galaxy (I'm gonna use HISAT2 instead of STAR), I'll try it. I'm now at the trimming stage. But from what I read, I can tell that R1 is reverse.

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by Jay-Run10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1521 users visited in the last hour
_