Question: Paired End FASTQ data; separate files for reverse forward strand reads?
0
gravatar for thenameisizaak
3.0 years ago by
thenameisizaak0 wrote:

I am designing some bioinformatics software, but have little working experience with FASTQ data.

The data I wish to compute over is paired end data. From which I understand consists of "mate sequences", namely left and right mates, which correspond to sequencing the same region of the genome, in the reverse and forward orientation.

My question is about how this data is returned back to the user after sequencing. Is the researcher given separate files containing only forward or reverse orientation sequences? Or is the data mixed together.

This basically comes down to how I process data in the software. If it is the case that separate orientations are given separate files, then the I can allow the user to specify the orientation at the command line; otherwise, I will have to read every sequence id to determine the orientation.

Kind regards, Izaak

paired-end fastq • 2.5k views
ADD COMMENTlink modified 3.0 years ago by Medhat8.4k • written 3.0 years ago by thenameisizaak0

Keep in mind that sequence is always present in 5'-->3' orientation no matter if it is the forward or reverse read. In case of Illumina there is a convention followed which indicates if the data is from forward or reverse read (rather first and second may be more appropriate to think about it). That information is encoded in the fastq header.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by genomax69k

Before you start reading too much into "reverse" and "forward", note that the pairs are just sequencing different ends of the same original fragment. Which of the two will end up being "forward" after alignment is essentially random and can't be determined from read IDs.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Devon Ryan91k

Yeah, I've read around that there is no real concept of which is forward or reverse, it was just easier to express ;) Also, out of interest, I've seen that, often, one of the mate pairs is sequenced, then the next. However, are they also sometimes sequenced in parallel, but with multiplex capable primers? Or is multiplexing mainly used to differentiate samples?

ADD REPLYlink written 3.0 years ago by thenameisizaak0

In Illumina technology only one read happens at a time. Order is generally [Read 1 --> Index 1 (if present) --> Index 2 (if present) --> Read 2]. Multiplex is only used to differentiate samples.

ADD REPLYlink written 3.0 years ago by genomax69k
1
gravatar for Medhat
3.0 years ago by
Medhat8.4k
Texas
Medhat8.4k wrote:

they could come in separate or same file

if it is in two files they will contain for example: first file forward reads and the other will contain the reverse

or same file and the will be differentiated

general info here

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Medhat8.4k

Right, so it is not safe to give the user the option of saying reverse or forward, I have to check each sequence. And therefore, the Seq_id can be used to determine the orientation?

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by thenameisizaak0

I think it should be some parameter for example 1 for forward file 2 for reverse and you can check the number and/or names in each file if they are different you can raise error for example:

SomeSoftware -option_do_someting 1 forward.fastq 2 reverse.fastq

real example for alignment

bowtie -S -t hg18_combined.fa.bowtie -1 Pair1.fastq -2 Pair2.fastq bowpeout.sam

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Medhat8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1143 users visited in the last hour