Question: Paired reads positions in FASTQ files
1
gravatar for riccardo
3.1 years ago by
riccardo80
riccardo80 wrote:

Hello, I have a question about the paired end sequencing. When you have the FASTQ files of the read1 and the read2, that come from a paired sequencing, is it correct to assume that if in the position 1, of the R1 file, you have the read X in the same position of the R2 file you have the paired reads of X? Because if this is not true you need to check the name of millions of sequences and it will be very time consuming, if only one reads is missing or is in the incorrect order in R1 or R2 you will have reads paired incorrectly, could this happen? Do you know if the aligners check the names of the reads when they align paired reads or they just rely on the position of the reads? Thank you.

Best

sequencing • 1.7k views
ADD COMMENTlink modified 3.1 years ago by genomax87k • written 3.1 years ago by riccardo80
1
gravatar for kristoffer.vittingseerup
3.1 years ago by
European Union
kristoffer.vittingseerup3.4k wrote:

That is a correct assumption and tools do not check these names. That is also why if you use tools that filter for quality you get 4 files: 2 paired and 2 unpaired. That said I've acutally had problems with propper pairing in the past so it is a good thing always to check the first sequences (basically comparing the sequence names which tells you alot about the sequencing run).

ADD COMMENTlink written 3.1 years ago by kristoffer.vittingseerup3.4k
1
gravatar for genomax
3.1 years ago by
genomax87k
United States
genomax87k wrote:

Order of read in R1/R2 files can get out of sync if you scan/trim the two files independently. That is the reason it is recommended that you use a paired-end aware scan/trim program along with the pair of files.

If you happen to have files that have gone out of sync you can use repair.sh from BBMap suite to restore the pairing of reads (see for command line example: C: Calculating number of reads for paired end reads? ) You could also use this as a diagnostic tool instead instead of comparing read names manually. If the files are in sync then there would be no change done.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by genomax87k
1

Hi, is this also possible if you consider the original files that the sequencer gives you in output? Thanks

ADD REPLYlink written 3.1 years ago by riccardo80
1

If no trimming has been done for the data then they should be in sync. If in doubt run repair.sh to be sure. If data is in sync then nothing should appear in the singleton's file.

ADD REPLYlink written 3.1 years ago by genomax87k

Hi, I think it's really not necessary if nothing appears in singleton's file your R1 and R2 are in sync. I have good quality sequencing data which required no trimming. Also starting reads were in sync. However, there were few reads in mid of the file that was out of sync. You can't always say at the face value if R1 and R2 are in sync until you face an error during the alignment step which is as follows:

[mem_sam_pe] paired reads have different names: "A00804:41:HNJ53DSXX:2:1165:1362:17660", "A00804:41:HNJ53DSXX:2:1145:26630:1047"

A more general question that comes to my mind and I haven't found an answer to is is it a sequencing defect or something went haywire during demultiplexing. Because I have such issue for all the samples that were run on a single flow cell. Quite strange though!

ADD REPLYlink written 4 weeks ago by rohitsatyam102170

I think it's really not necessary if nothing appears in singleton's file your R1 and R2 are in sync

That should not happen if you are using repair.sh tool. If your files are not in sync it should flag those.

If your have reads that have the relevant part of identifiers (e.g. 1:Y:18:ATCACG) stripped away from fastq headers then it would be difficult for any program to find if reads are out of sync.

it a sequencing defect or something went haywire during demultiplexing.

Are you referring to original read files? No manipulation has been done to them after they came off the sequencher/demultiplexing before you started these alignments?

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax87k

Precisely. I have files where no manipulation was done post demultiplexing. When I align it to the reference I get an error paired reads have different names. I used repair.sh script to reorder the files. My singleton files (for all 4 samples) are empty. However, post repair.sh the error disappears.

Since I didn't perform any preprocessing on fastq files and went on for direct alignment, I suspect something might have gone fishy during demultiplexing. But I don't have any evidence/explanation on why would it happen during demultiplexing.

ADD REPLYlink written 4 weeks ago by rohitsatyam102170
1

However, post repair.sh the error disappears.

So repair.sh does work as intended. There are very rare errors like this in the output of bcl2fastq. One speculation I have is that these files were made using a file system that was not performant. It may not have kept up with the processes that wrote the output file properly.

But your point is well taken. In this specific instance, singleton files will be empty, after repair.sh does its job.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1567 users visited in the last hour