FASTQ nucleotide masking
0
0
Entering edit mode
2.1 years ago
atariw ▴ 10

Dear all,

I am wondering whether could it be a good practice to mask the bases of an illumina FASTQ file having a quality score below a certain threshold (e.g. 20). I noticed there are a couple of utilities (e.g. fastx-toolkit or seqtk) that replace with the conventional character "N" such low quality bases. What are your thoughts about it ? Could it be useful or is it irrelevant for follow up analysis (e.g. for bulk RNA-Seq or bulk DNA-Seq) ? And if is it useful, what could it be a reasonable threshold (10, 15 or 20) ?

Thx a lot

Fastq mask phred quality • 734 views
ADD COMMENT
1
Entering edit mode

What is the use case for this? If there is a nucleotide or two with a bad score then it is not going to make big difference in alignments but if it is stretch of bad scores then that read probably needs to be trimmed or thrown away.

ADD REPLY
0
Entering edit mode

Thx for the answer. Our use case is a standard RNA-Seq pipeline where we align the reads with STAR and a standard Variant Calling GATK pipeline where we align the reads with BWA.

ADD REPLY
0
Entering edit mode

I see no advantage of this. If you have long stretches of bad bases, as GenoMax says, you would trim the read anyway, and this would only be necessary if the error is systematic so fastqc would report it. For anything else it would be very few reads being affected not meriting the effort. In general I would recommend to only consider fiddling with these non-standard lowlevel things in the preprocessing if along the way of your analysis something is very wrong and you have good reason to believe that the read quality itself was a driver of these problems. After all read processing is super standard these days and it is unlikely that such a custom method would notably improve things.

ADD REPLY

Login before adding your answer.

Traffic: 1312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6