? in cram files
1
0
Entering edit mode
3 months ago
joe_genome ▴ 10

I have an original bam file that when compressed to cram format, the quality encoding scores are lost and replaced as question marks ? and other symbols. The following questions come up:

1. Why is it that the original base quality scores are changed when compressing from bam to cram?
1. Is it possible to map back the base quality scores from a cram file to original bam file using a reference file?

## Example:

Converted Cram File

CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAACCTTACCATAAACCTAACCCTAACCCAAAACCTAACCCATAAACAAACCATAAAAAAAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

????????????????????????????????????????????+5?++5?55????+??++?5??'+??'+'+?+++++??'?5++++'+++&+?5+++'++++++'++'++++++++++?+++++????+???+?????+?????+???

sequencing genomics • 519 views
0
Entering edit mode

As an alternative, you might also want to check my Genozip tool, which usually compresses better than CRAM, and is 100% lossless. It can even compress cram files. Some benchmarks here: https://genozip.com/benchmarks.html

0
Entering edit mode

Please make a tool post rather than putting this in unrelated threads.

2
Entering edit mode
3 months ago
1. The base qualities don't have to be lost, that's an optional feature and should generally not produce what you posted.
2. Unlikely. Someone messed something up when creating the CRAM file. Note that phred scores aren't actually that informative when it comes to SNP calling these days, which is why newer sequencers are starting to bin them.

Overall, review how the CRAM file was made, likely there was a mistake at that point.

0
Entering edit mode

Thanks for the response Devon!

1. Is there an optional feature when using samtools then I believe? I was trying to use RevertSam to put the cram back to it's original state (bam) and thought with the reference would get the scores but didn't happen.
2. The phred scores are needed in some pipelines, hence why I wanted to keep them.

Thanks

1
Entering edit mode

Samtools itself doesn't have an option to modify base qualities, it turns out that it's just read names and some auxiliary information (MD tags) that it can be told to store in a lossy manner. So the error must have occurred upstream if you used samtools for conversion.

0
Entering edit mode

Thanks Devon, this clarifies quite a bit what I was looking for and the direction I need to take!

0
Entering edit mode

Can you try samtools view to do the conversions of CRAM to BAM? Using a third party tool like Picard may not be following the latest CRAM/BAM specs.