99.9999% of Q30 bases is normal?
0
0
Entering edit mode
19 days ago
Aki ▴ 20

I did fastp using published fastq files of single-end RNA seq data, and I got 99.9999% of Q20 bases and 99.9999% of Q30 bases. I have never got this score. I am a beginner in this informatics field, so I don't know if it is normal. Could you give me any suggestions?

Detecting adapter sequence for read1...
No adapter detected for read1

Read1 before filtering:
total reads: 47471798
total bases: 4747179800
Q20 bases: 4747174600(99.9999%)
Q30 bases: 4747174600(99.9999%)

Read1 after filtering:
total reads: 47471746
total bases: 4557287616
Q20 bases: 4557287616(100%)
Q30 bases: 4557287616(100%)

Filtering result:
reads passed filter: 47471746
reads failed due to low quality: 52
reads failed due to too many N: 0
reads failed due to too short: 0
reads with adapter trimmed: 0
bases trimmed due to adapters: 0

Duplication rate (may be overestimated since this is SE data): 60.5205%

JSON report: ./report/SRR23031659_fastp.json
HTML report: ./report/SRR23031659_fastp.html

fastp -i ./SRR23031659.fastq.gz -3 -o out_SRR23031659.fq.gz --html ./report/SRR23031659_fastp.html -j ./report/SRR23031659_fastp.json -q 15 -n 10 -t 1 -T 1 -l 20 
fastp v0.23.4, time used: 96 seconds

Thanks in advance.

RNA-seq fastp • 601 views
ADD COMMENT
0
Entering edit mode

If the Q20 score is greater than 20, it will indicate higher probability of being correct. Similarly if Q30 score is also greater than 30, it will represent exceptional confidence accuracy. Please check out this blog.

ADD REPLY
0
Entering edit mode

Is this from a AVITI sequencer? They do have quite high quality scores.

ADD REPLY
0
Entering edit mode

Thanks jkim. They seem to use MGISEQ-2000RS (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM6925047). Do you have any information on this model?

ADD REPLY
0
Entering edit mode

I have no idea. Good luck!

ADD REPLY
0
Entering edit mode

Thank you!

ADD REPLY
0
Entering edit mode

Some companies may change the Q value to some fixed values to save storage, do you know did they do something like that? This is my guess.

ADD REPLY
0
Entering edit mode

for sure illumina does this. they just upped it from 22 to 25 for certain calls etc. they base it on aggregated data then update the priors

ADD REPLY
0
Entering edit mode

the bottom line is if compression is a concern then they will lump together things in the 20s as like 22 or 25 or whatever the closest fit is, that kind of thing.

regarding 3rd gen, nanopore too reports estimated quality scores in place of empiric in certain cases (though recently comparison has justified the estimates) which implies similar practices though i can comment specifically on most recent practices (changing fast). dont know enough about pacbio to say

ADD REPLY
0
Entering edit mode

it means you are good at bioinformatics. keep doing things

ADD REPLY

Login before adding your answer.

Traffic: 2848 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6