Question: Change Phred quality score with DWGSIM
0
gravatar for marongiu.luigi
11 months ago by
Germany, Mannheim, UMM
marongiu.luigi380 wrote:

Dear all,

I am trying to create simulated Illumina reads using DWGSIM. I provided a fasta file (input.fa) and obtained the expected output with the command:

dwgsim -c 0 -C 30 -1 150 -2 150 -S 2 -Q 5

Using FastQC I obtained a decent quality score graph with well-spread bar plots, the only thing is that the average values are at 17 thus they look not good in a presentation: enter image description here I, therefore, increase the average quality with the option -q using a phred33 code (B -> Q =33)

dwgsim -c 0 -C 30 -1 150 -2 150 -S 2 -q B -Q 5 <input.fa> <output>

but in this case the values had a single value of 2: enter image description here then I used a phred64 code (a -> Q=33)

dwgsim -c 0 -C 30 -1 150 -2 150 -S 2 -q a -Q 5 <input.fa> <output>

in which case the values had a single value of 33: enter image description here In both cases, though, there were no longer box and whiskers plots but a simple line (alas I don't know how to show the figure). I believe that DWGSIM and Fastqc use different phred codes, hence the downshift in the first plot; however I could not change the baseline coding for DWGSIM so that it could match Fastqc.

My questions are:

  1. since -q a returned the expected mean value of quality = 33 in the fastqc plot, I assume DWGSIM has the phred 64 score as default; it is possible to change the score to phred 33?

  2. since the use of -q has removed the nice boxplots from the figure, it is possible to shift the boxplots to a given value? if not with the -q flag, maybe there is another way...?

Thank you

sequencing next-gen sequence • 445 views
ADD COMMENTlink modified 11 months ago by h.mon24k • written 11 months ago by marongiu.luigi380

alas I don't know how to show the figure

Here is a tutorial on how to post figures:

https://github.com/RamRS/biostar-mod-templates/blob/master/how-to/howto-add-image.md

ADD REPLYlink modified 11 months ago • written 11 months ago by h.mon24k

Thank you, that was a useful tip. i updated the post.

ADD REPLYlink written 11 months ago by marongiu.luigi380
3
gravatar for h.mon
11 months ago by
h.mon24k
Brazil
h.mon24k wrote:

The command-line help is not very clear (at least to me), but what you want is to set -e and -E:

     -e FLOAT      per base/color/flow error rate of the first read [from 0.020 to 0.020 by 0.000]
     -E FLOAT      per base/color/flow error rate of the second read [from 0.020 to 0.020 by 0.000]

You have used the default value of 0.02, which means a Phred score of 16.9897 ( log10(0.02) * -10 ) - exactly what your FastQC shows. Play with -e and -E to get the intended values.

References:

SeqAnswers: Software for generating "synthetic" reads

Github issue: Default Illumina quality scores are too low #43.

ADD COMMENTlink modified 11 months ago • written 11 months ago by h.mon24k

That is what I was looking for, thank you. Now the plot is how it should be: enter image description here I used:

dwgsim -c 0 -C 30 -1 150 -2 150 -S 2 -e 0.0008 -E 0.0008 <input> <output>
ADD REPLYlink modified 11 months ago • written 11 months ago by marongiu.luigi380

So please indicate the question has been solved by accepting my answer. Please consider doing this for your other posts as well.

Upvote|Bookmark|Accept

ADD REPLYlink written 11 months ago by h.mon24k
1
gravatar for Vijay Lakhujani
11 months ago by
Vijay Lakhujani4.0k
India
Vijay Lakhujani4.0k wrote:

since -q a returned the expected mean value of quality = 33 in the fastqc plot, I assume DWGSIM has the phred 64 score as default; it is possible to change the score to phred 33?

I don't know if that could be changed from within DWGSIM, however, that would be easy for you to make out as the phred 33 and 64 are just the ASCII offsets. Look at this picture which makes perfect sense to me.

since the use of -q has removed the nice boxplots from the figure, it is possible to shift the boxplots to a given value? if not with the -q flag, maybe there is another way...?

From the github page, it appears to me that -q is a fixed base quality , hence a single line in fastQC.

     -q STRING     a fixed base quality to apply (single character) [not using]

Can you try changing the std dev using -Q ? Don't use -q

ADD COMMENTlink modified 11 months ago • written 11 months ago by Vijay Lakhujani4.0k

I know 33 and 64 are the offsets but if one looks at the first plot, he would say the quality has not passed, hence I need to rise the offset in DWGSIM. I already used -Q, which is the standard deviation of the quality and it does work: in the first plot the values are in the range 17+/-5.

ADD REPLYlink written 11 months ago by marongiu.luigi380
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 720 users visited in the last hour