Question: Fastq files with only " ! " score
0
gravatar for pablo
7 weeks ago by
pablo140
pablo140 wrote:

Hello,

I got reads from PacBio sequencing. The reads are in BAM format. I converted them into FASTQ format.

I both used the bam2fastq and samtools fastq tools to do that.

The problem I have is that I got the " ! " score for all bases of the sequences with both tools, which means the bases are all wrong. What is not good because the phred score I got with FASTQC is really good (and reads obtained with the PacBio tech are usually always good)

Any idea?

Bests

samtools fastq • 178 views
ADD COMMENTlink modified 7 weeks ago by genomax89k • written 7 weeks ago by pablo140
1

some more info which I recently found online:

Please note that raw data quality scores are the same for all bases of the Sequel raw data (PHRED 0 — ASCII !). PacBio came to the conclusion that computing the quality scores for the raw data was a waste of time. Apparently the quality scores for the raw data cannot be reliably computed (and consequently these were also ignored for RSII data pipelines). However, usable PacBio quality scores can be generated from consensus data if the project allows (either by CCS or other secondary analysis algorithms: e.g. by alignments all-vs-all). In short the determination of the quality of individual reads is up the downstream analysis pipeline (e.g. the assembler).

ADD REPLYlink written 7 weeks ago by lieven.sterck8.5k

base scores in pacbio fastq files have no to very little meaning due to the specifics of the pacbio technique (at least that's how I remember from older datasets, perhaps it changed for more recent datasets), so don't worry too much about this I would say.

If you use the conversion tools of pacbio smrt package, I think you can even say what you want to scores to be

Just use the data as it is without taking the scores into account.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by lieven.sterck8.5k
1
gravatar for genomax
7 weeks ago by
genomax89k
United States
genomax89k wrote:

I got reads from PacBio sequencing. The reads are in BAM format.

Then use PacBio's utility bam2fastx to do the conversion.

The problem I have is that I got the " ! " score for all bases of the sequences with both tools, which means the bases are all wrong.

There is a thread on SeqAnswers about your observation. I will deep link one post. You can read the entire thread. Is your data older?

ADD COMMENTlink modified 7 weeks ago • written 7 weeks ago by genomax89k

Actually, I used the bam2fastq tool, but as I said, I also got "!" score for each base.

My data are pretty new.

ADD REPLYlink written 7 weeks ago by pablo140

I guess the ! has not been replaced with meaningful values as stated by user rhall (who works for PacBio) in another post in the thread I had linked above from SeqAnswers.

I would follow @lieven's advice above or replace ! with something else using reformat.sh from BBMap suite.

ADD REPLYlink modified 7 weeks ago • written 7 weeks ago by genomax89k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1341 users visited in the last hour