Question: Invalid quality score value when using fastq_quality_filter from FASTX_toolkit
0
gravatar for kelvinfrog75
4.4 years ago by
kelvinfrog7510
kelvinfrog7510 wrote:

Hey, I am using the function fastq_quality_filter from FASTX_toolkit. Here is my code:

fastq_quality_filter -q 20 -i input_path -o output_path

However, I got this error message "Invalid quality score value (char '.' ord 46 quality value -18) on line 4".

My sequences were generated from Illumina and I think it is TrueSeq. Below is the few sequences in the fastq file. Does anyone know why I am getting the invalid quality score?

@NS500216:139:H2JLWAFXX:1:11101:9276:1046 1:N:0:31
CACCTATCCCAACGCTGCCCATGCCGTCCGCCCGGCCGTCGCCGATGCCCGGCAGCCGCAACACGCCCTTCCCGGTGACCGCCTCGTGCGTCAACCCGCCCCCTCCCCGGGAACCTGGGCGTTCTGGCGACGCGACAGCCGGGATNTGGCN
+
FF....7A<<..A)F7F.FFF.AFAAA.<F<.))A<FF<<AF<AAF<<7<F<FFFF.FFFAF7FA.F...FFAAA)FAF)7)F.F)F<FFFFF7<F.<F..FF.F)).7F<F.F.7FFFA<).F.FFFFFF<.F<FFF.<7FFFF#<AAA#
@NS500216:139:H2JLWAFXX:1:11101:24009:1049 1:N:0:31
CAGGTGGATTGGGGGAGCAAGGGTGAGTCAGCCACGGTGTGCATGGACGGCAACAATGCGAACGCGCCGAAGAAGGAATGCAAGTCGGGCGAGGAGTAATCGCTAGACTGGCTGTTTGGCGACATCGGCGGCGCGTCCGCCANTGAGN
+
FF7F<7<<7)<..F.F.FFF7<7)FFF.)..F..FA).AFFFAFFFFFF.7FFF.FFFF)F<7F.<7FFFFAA.F.FFF<AFFFA.<7.<FFF<FFF.F<FAA.F7AF7.F.AFFAF.FF7A7FFAF<FAAFAF<<F<F.FF#<AA<#
@NS500216:139:H2JLWAFXX:1:11101:21209:1051 1:N:0:31
ACCGCGACGATCTGCTGCGCTTGCAGGACAAGGAGCAGCGCACCCTCCGGCCGATGGTGGTGCCGTTCAACCTGAAATGAGGAGGGCAGGACCGTGCCGCTGAAGCAGTAGCAAGAGCGCGTGCTGCGGGAGGTCAAGCACTTCCNTGAAN
+
AAAFFAFFA7AF.7<)F.FA.)A.7FFAF<7FFAF)<)FFF.A<<F<AFAFFF)F<7A<F)F7.F.FFFFF<.FFFA<FAF7AF<FFFFFFFFAFAFFAF)FFAFFFFFF.FAFFFFFFAFFFF<FFFFFA.F.F7AAFFFA.FF#AAAA#
@NS500216:139:H2JLWAFXX:1:11101:16640:1054 1:N:0:31
ATGCCCCTCTATGTTACGGCGTTCGATATTGTCAGCGGTCGCCTCCTTCTCTTTGGCGAAGACCCTCGCGCACCAGTGGCCGAGGCTGTGTTGGCTAGTTCATCCATCCCAGGCAGCCATCCTCCTCTGAATTATCACGGACTCCNGCTTN
+
sequencing • 1.7k views
ADD COMMENTlink modified 4.4 years ago by Brian Bushnell17k • written 4.4 years ago by kelvinfrog7510
2

Add option -Q33 if you want to keep using FASTX_toolkit.

I second using BBDuk.sh instead.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by genomax92k
1
gravatar for Brian Bushnell
4.4 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

IIRC, FastX Toolkit assumes input data is encoded using old ASCII-64 quality scores, which is basically never the case any more. I suggest you use a more modern program such as BBDuk for doing quality-score trimming or filtering; it's faster, will do a better job, and will not break the pairing order of your reads, which FastX will. To do that operation, with input fles named read1.fq and read2.fq, you would type:

bbduk.sh in=read#.fq out=filtered#.fq maq=20

Not that I would recommend doing that, by the way. 20 is usually much too high of a level for quality-filtering, and I think quality-trimming is a better operation anyway, in most cases. Using very high thresholds will increase bias.

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by Brian Bushnell17k

Thanks!. I will check out the BBDuk. What is the score you will recommend for quality filtering. Does quality-trimming also use score? If so, what score will you recommend? Thanks.

ADD REPLYlink written 4.4 years ago by kelvinfrog7510

If you are aligning to a reference then you could omit Q-score based filtering altogether (or if you must, filter Q10 and below). For de novo assembly work you may want to be more stringent (Q20 or more).

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by genomax92k

Hey, I want to ask if the Q-score is referred to quality filtering or trimming. Also, if I want to do trimming using BBDuk, what will be the script look like? Do I need to use both qtrim and trimq? Thanks

ADD REPLYlink written 4.4 years ago by kelvinfrog7510

Yes. "qtrim" tells it which side to trim on, and "trimq" tells it the quality threshold. A sample command would be:

bbduk.sh in=read#.fq out=trimmed#.fq qtrim=rl trimq=12

That will trim the left and right ends of each read to Q12 (the remaining portion of the read will have average quality scores of at least 12). Both quality-filtering and quality-trimming use quality scores, but filtering throws away the entire read, while trimming just removes the low-quality bases from the ends and keeps the rest of the read.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Brian Bushnell17k

Great, I do think BBDuk is better than FASTx tools. Thanks.

ADD REPLYlink written 4.4 years ago by kelvinfrog7510

@Brian: This would not remove any adapter contamination, if present, correct?

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by genomax92k

Correct. And it's best to remove adapters prior to quality-trimming. In the bbmap directory there is a subdirectory "docs" which has a subdirectory "guides". In that there is BBDukGuide.txt and PreprocessingGuide.txt. The preprocessing guide contains my recommended procedures for preprocessing raw Illumina reads prior to use, including the best order (note that many of the steps are optional and depend on your experiment, but that's the order you would do them if you wanted to). The BBDuk guide has sample command lines for typical operations like quality-trimming or adapter-trimming.

ADD REPLYlink written 4.4 years ago by Brian Bushnell17k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1251 users visited in the last hour