Change Phred quality score with DWGSIM
2
0
Entering edit mode
6.0 years ago

Dear all,

I am trying to create simulated Illumina reads using DWGSIM. I provided a fasta file (input.fa) and obtained the expected output with the command:

dwgsim -c 0 -C 30 -1 150 -2 150 -S 2 -Q 5

Using FastQC I obtained a decent quality score graph with well-spread bar plots, the only thing is that the average values are at 17 thus they look not good in a presentation: enter image description here I, therefore, increase the average quality with the option -q using a phred33 code (B -> Q =33)

dwgsim -c 0 -C 30 -1 150 -2 150 -S 2 -q B -Q 5 <input.fa> <output>

but in this case the values had a single value of 2: enter image description here then I used a phred64 code (a -> Q=33)

dwgsim -c 0 -C 30 -1 150 -2 150 -S 2 -q a -Q 5 <input.fa> <output>

in which case the values had a single value of 33: enter image description here In both cases, though, there were no longer box and whiskers plots but a simple line (alas I don't know how to show the figure). I believe that DWGSIM and Fastqc use different phred codes, hence the downshift in the first plot; however I could not change the baseline coding for DWGSIM so that it could match Fastqc.

My questions are:

  1. since -q a returned the expected mean value of quality = 33 in the fastqc plot, I assume DWGSIM has the phred 64 score as default; it is possible to change the score to phred 33?

  2. since the use of -q has removed the nice boxplots from the figure, it is possible to shift the boxplots to a given value? if not with the -q flag, maybe there is another way...?

Thank you

sequence sequencing next-gen • 1.8k views
ADD COMMENT
0
Entering edit mode

alas I don't know how to show the figure

Here is a tutorial on how to post figures:

https://github.com/RamRS/biostar-mod-templates/blob/master/how-to/howto-add-image.md

ADD REPLY
0
Entering edit mode

Thank you, that was a useful tip. i updated the post.

ADD REPLY
4
Entering edit mode
6.0 years ago
h.mon 35k

The command-line help is not very clear (at least to me), but what you want is to set -e and -E:

     -e FLOAT      per base/color/flow error rate of the first read [from 0.020 to 0.020 by 0.000]
     -E FLOAT      per base/color/flow error rate of the second read [from 0.020 to 0.020 by 0.000]

You have used the default value of 0.02, which means a Phred score of 16.9897 ( log10(0.02) * -10 ) - exactly what your FastQC shows. Play with -e and -E to get the intended values.

References:

SeqAnswers: Software for generating "synthetic" reads

Github issue: Default Illumina quality scores are too low #43.

ADD COMMENT
0
Entering edit mode

That is what I was looking for, thank you. Now the plot is how it should be: enter image description here I used:

dwgsim -c 0 -C 30 -1 150 -2 150 -S 2 -e 0.0008 -E 0.0008 <input> <output>
ADD REPLY
0
Entering edit mode

So please indicate the question has been solved by accepting my answer. Please consider doing this for your other posts as well.

Upvote|Bookmark|Accept

ADD REPLY
1
Entering edit mode
6.0 years ago

since -q a returned the expected mean value of quality = 33 in the fastqc plot, I assume DWGSIM has the phred 64 score as default; it is possible to change the score to phred 33?

I don't know if that could be changed from within DWGSIM, however, that would be easy for you to make out as the phred 33 and 64 are just the ASCII offsets. Look at this picture which makes perfect sense to me.

since the use of -q has removed the nice boxplots from the figure, it is possible to shift the boxplots to a given value? if not with the -q flag, maybe there is another way...?

From the github page, it appears to me that -q is a fixed base quality , hence a single line in fastQC.

     -q STRING     a fixed base quality to apply (single character) [not using]

Can you try changing the std dev using -Q ? Don't use -q

ADD COMMENT
0
Entering edit mode

I know 33 and 64 are the offsets but if one looks at the first plot, he would say the quality has not passed, hence I need to rise the offset in DWGSIM. I already used -Q, which is the standard deviation of the quality and it does work: in the first plot the values are in the range 17+/-5.

ADD REPLY

Login before adding your answer.

Traffic: 1723 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6