Is it necessary to sort fastq files by sequence before running variant analysis?
1
0
Entering edit mode
6 weeks ago
SlowSD • 0

Hello there,

With this tiny script, I got to know that my fastq reads are not sorted. I wanted to know whether sort by sequence of fastq reads is necessary? During what step of analysis sorting fastq reads plays a crucial role? Is it a good practice?

if sort -C file; then
  # return code 0
  echo "sorted"
else
  # return code not 0
  echo "not sorted"
fi

Thank you

fastq NGS sort • 355 views
ADD COMMENT
2
Entering edit mode
6 weeks ago
Michael 54k

You do not need to sort fastq files for any relevant analysis. But if you have paired-end data the data needs to be properly paired. Sorting the file with Unix sort will destroy it because it doesn't know about the internal structure of the fastq records and sorts lines. It is generally best to leave the raw data alone. You may have to sort and index the SAM/BAM files resulting from an alignment. But this should be done with a software like Samtools that can read the records properly.

ADD COMMENT
0
Entering edit mode

Thank you for answering.

Yeah. Performing alignment and sorting with samtools will definitely help in this.

I actually peeped through head command on the after_trim fastq file and I observed that one read was missing in it. While through head command, I could see the same read in the raw_fastq file.

I want to know whether the read is lost in trimming or not.

Thank you again.

ADD REPLY
1
Entering edit mode

I want to know whether the read is lost in trimming or not.

Are you seeing just the empty fastq header without sequence, then the answer is yes. Depending on which program you used for trimming some may leave empty reads (generally you should use a minimum_length filter since reads below 25 bp or so are not likely to align well). Empty fastq reads should be eliminated since many program will have problems with them.

ADD REPLY
0
Entering edit mode

Thank you for answering.

I have actually used trimmomatic. I used head command but I wasn't able to find a read ID in the after_trim fastq file. I was wondering whether I have lost this read or it is lying somewhere down in the file and cannot be seen using head command, as the after_trim fastq file is not sorted.

Thankfully, using simple less /read ID command, I confirmed that the read is not there in after_trim fastq file.

Best. SD

ADD REPLY
0
Entering edit mode

I wasn't able to find a read ID in the after_trim fastq file

Not exactly sure what that means in any case you shouldn't sort fastq files using plain unix commands e.g. sort. They will mess up fastq format.

ADD REPLY

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6