Question: Paired-end reads - what is what?
gravatar for darxsys
4.1 years ago by
darxsys190 wrote:

This is probably a dummy question, but I haven't found an explanation and I want to be sure. When paired-end sequencing is done, there are usually 2 fastq files generated, one with "left" mates and the other with "right" mates.

Assuming that mates in the first file are read from the forward strand, are then mates in the second file read from the reverse strand?


If so, what is exactly written in the second file? A reverse complement of the read sequence (the corresponding sequence on the forward strand)  or the pure reverse-strand sequence? And in which direction?


sequencing rna-seq paired-end • 21k views
ADD COMMENTlink modified 4.1 years ago by thackl2.6k • written 4.1 years ago by darxsys190
gravatar for thackl
4.1 years ago by
thackl2.6k wrote:

Illumina paired-end sequencing is based on the idea that you have initial DNA fragments (longer than your actual read length) and you sequence both its ends. On the Illumina chip, both ends of each sequence are amplified prior to actual sequencing using bridging. This approach results in two reads per fragment, with the first read in forward orientation and the second read in reverse-complement orientation. Depending on the inital fragment size and read length, these fragment can either overlap or not

For example, with 100 bp reads:



Therefore, the first fastq file will contain all "r1" reads, the second file all "r2" reads.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by thackl2.6k

So in the second file, if it says: TTCAG, that actually corresponds to CTGAA in the forward strand?

ADD REPLYlink written 4.1 years ago by darxsys190

Yes, to have r1 and r2 in the same orientation, you need to reverse-complement one of the two reads. Keep in mind though, you usually do not know, which of the two reads corresponds to the forward strand of your initial template. Although, forward strand is just a definition anyways, usually based on transcription direction.

For double-stranded genomic DNA there is transcribed stuff on both strands. In standard protocols the generation of fragments is random, as is the strand they originate from. I actually do not know, if there is a rule used by the Illumina machine to decides which read to output as r1 and which as r2.

For RNAs, there obviously is a forward strand. But only in case of strand-specific RNA-seq you actually know how your reads are oriented with respect to your inital template.

ADD REPLYlink written 4.1 years ago by thackl2.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1300 users visited in the last hour