Single-stranded RNA-seq: strandedness of samples
1
0
Entering edit mode
8 weeks ago
fr ▴ 160

I have gotten some RNAseq data and I'm starting to process it. It was carried 3’mRNA library preparation and thus this is a single-end sequencing experiment. It was then sequenced with Illumina Nextseq5000. I'm now trying to understand what I need to do and have a couple of basic questions:

1. As this is a 3' sequencing, does this mean that the strandedness of my fastQs is always reverse?
2. Despite this, the PCR step still includes both forward and reverse primers. Why is this so considering this is a single-stranded ?
3. For each file I have the following, but am confused as to exactly what these mean:
• 90_L001_R1_001.fastq.gz
• 90_L002_R1_001.fastq.gz
• 90_L003_R1_001.fastq.gz
• 90_L004_R1_001.fastq.gz
4. If this had been a paired-end sequencing experiment, I would've gotten two fastQ files per sample. How do you associate them with forward, reverse, or unstranded sequencing?

rnaseq • 316 views
0
Entering edit mode

These seem to be only the forward reads of a read pair (R1). You also need the files containg the mates (R2). Also, does the 90 signify your sample? Because then you may have technical replicates here, as the L(...) may signify, that this sample has been repeatedly sequenced on different lanes?

Edit: single-end reads of course don't have mates.

0
Entering edit mode

@ponganta, I have re-added my initial point 3. Yes, I believe those are 4 different lanes, as this seems to be the case for flowcells in Nextseq 5000. 90 is indeed the sample, but I'm still trying to puzzle it exactly what it means to my own sample metadata. Should we have R2 if this is single-end sequencing?

2
Entering edit mode
8 weeks ago
ATpoint 54k

I think there are some misconceptions here.

a) 3'sequencing simply means that only the 3-end of the transcripts are enriched, but this does not determine whether you do single- or paired-end sequencing.

b) PCR always includes a forward and reverse primer, and 3'sequencing is not a single-stranded assay. Stranded simply means that by addition of suitable reagents (e.g. dUTP method) you "label" the strand that the transcript comes from. A stranded library would tell you whether a gene comes from the top or bottom strand of the genome (e.g. when two genes overlap, one being on top and one on bottom strand, in this case a stranded library tells you which of both genes is expressed). Still, the final DNA sequencing library will always be double-stranded. Everything else would be incompatible with how sequencing and PCR works. Towards questions1, that depends on the kit that was used. I am also not sure what exactly you ask, as you seem to be confusing/mixing-up single-end with single-stranded, or with strandedness of libraries in general. Towards question 2, as said it does not make sense, see above.

Towards 3) yes this looks like several lanes of the same sample. Just cat the files together, that's all you have to do. By the way, there is no instrument called Nextseq5000, but this anyway does not matter for the context of this question.

Towards 4, the sequencing mode does not have anything to do with strandedness, it just means that both ends of the DNA fragment have been sequenced.

In the end, what you probably want to do is to find out which kit was used and ask the facility towards strandedness. Alternatively, you can just use salmon with -l A option which will then try to make a best guess which stranded ness you library has, and then quantify data accordingly.

1
Entering edit mode

ATpoint thank you very very much, I really appreciate it. You are correct, the instrument is Nextset500, not 5000.

I can see now that the kit that was used was QuantSeq 3'mRNA-Seq Library Prep FWD Kit, which seem to me it is forward strandedness. I understand that this has nothing to do with single-/pair-ended sequencing now, thanks for making me understand this. After studying a bit more I see why it was all mixed up.