Does Order Within Read Pairs In Interleaved Files Matter?
3
0
Entering edit mode
10.7 years ago
DoubleDecker ▴ 180

I have now ended up with interleaved paired-end read files where the order of reads is not the same throughout the file, ie. sometimes the forward read is first, sometimes the reverse read is first, but they are otherwise paired in succession.

I have always assumed that all the programs just look at the order of the reads and headers are ignored, since it does not matter which read is forward, which is reverse, so I should be fine with this for downstream applications?

paired-end reads • 4.7k views
ADD COMMENT
0
Entering edit mode
10.7 years ago

If it's straight from the aligner, typically read #1 is first, regardless of its orientation. Regardless, anything written to process name-sorted SAM or BAM files should be able to deal with that simply by looking at the flags. If a program can't deal with that, I wouldn't trust its output to begin with since it's likely to be doing many things incorrectly.

ADD COMMENT
0
Entering edit mode

Note that I am talking about .fastq files here.

ADD REPLY
0
Entering edit mode

Oh, it's incredibly unusual to interleave fastq files like that, which is why, without other context, I assumed you were talking about aligned reads. What are you hoping to achieve by interleaving your fastq files?

ADD REPLY
0
Entering edit mode

A lot of assemblers accept only interleaved reads as input.

ADD REPLY
0
Entering edit mode
10.7 years ago

I would definitely reorder my fastq files the same way to avoid potential troubles, which is not that difficult. For instance, you can sort your two files by ID, take a look here: how to efficiently sort a FASTQ file by entry ID?

ADD COMMENT
0
Entering edit mode

Yeah, you are right, does not take long but better sort the reads by their names just in case. I found this bash script very helpful: cat file.fastq | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > file_sorted.fastq

ADD REPLY
0
Entering edit mode
10.7 years ago

Just to check, you know that while the two reads of a pair should run in opposite directions, you know that most library preps work so that the direction that read 1 goes in is random, right?

So I should think that you would expect an interleaved file to go read 1, read 2, read 1, read 2, etc, read 1 could go in either direction, and read 2 will go in the opposite direction.

ADD COMMENT

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6