6 weeks ago by
I am going to take the liberty of posting a section from NCBI proposal below that describes the actual process of quality conversion so people can read it right here.
While this may appear like a drastic change, Illumina has been doing something similar for a while by producing binned Q-scores for larger datasets for a while. In a way NCBI is going to do part of the work for you. Instead of having to check each base now you use the read that passes the
filter as defined below.
While cloud storage may be cheaper it is still not free. Making users pay for the downloads while keeping original Q scores intact would lock out a large population of researchers across the world who simply won't be able to pay. So a solution that can still work reasonably well for NCBI is needed.
EBI/ENA and DDBJ may choose to go a different route and keep the original data available. Users can just go there in that case, like they can do now to get fastq files directly.
The BQS removal process removes quality scores from an SRA file. The
process assesses overall read quality and sets a per-read quality
flag. In the resulting files, all reads have a Read_Filter flag with
value reject or pass.
In the resulting files, all reads have a Read_Filter flag with value
reject or pass. Illumina fastq and Sam/Bam specifications support a
quality bit that is set by the sequencing instrument. SRA format
stores this as a pass/reject Read_Filter value. If this bit is set in
the submitted fastq or bam file, the value will be retained. If it is
not set, SRA will set a pass/fail value based on the quality score
Reads that have more than half of quality score values <20 will be
flagged reject. Reads that begin or end with a run of more than 10
quality scores <20 are also flagged reject. When accessing or dumping
data from SRA format using fastq-dump or fasterq-dump utilities in the
SRA Toolkit, rejected reads are not used by default. There are options
for including them:
fasterq-dump --read-filter <[pass|reject]>
It is still possible to produce FASTQ from ETL-BQS files using the SRA
Toolkit. In this case, the FASTQ will have a constant quality score
set to 30 for reads with Read_ Filter value pass and 3 for reject
modified 6 weeks ago
6 weeks ago by
genomax ♦ 87k