Is samtools' compression of bam to cram lossless?
2
1
Entering edit mode
4.5 years ago

One can use CRAMtools to convert(compress) bam into cram. CRAMtools has options to allow for lossy and non-lossy compression (--lossless-quality-score and --lossy-quality-score-spec).

Samtools is also capable of compressing bam into CRAM but with limited parameteres, one just specifies the output to be cram (using option -C). So my question is, do you know if the samtools' compression of bam into cram is lossy or lossless?

Thx

samtools cram • 2.9k views
2
Entering edit mode

My guess is that it will be 'lossless'. If it were lossy, it must have been documented. You can easily check it by yourself. Get a sam -> convert to bam -> cram -> sam. Then compare this sam with original sam.

4
Entering edit mode
4.4 years ago

There is some sketchy documentation in the samtools man page (eg here: https://github.com/samtools/samtools/blob/develop/samtools.1#L1879 and further down for lossy names). These document options specified using --input-fmt-option or --output-fmt-option. Note it's not always obvious which is which without reading the man page. For example the required-field parameter is designed to avoid decoding certain components of a CRAM file, so when decoding CRAM or converting from CRAM to CRAM you can elect to simply drop certain fields. This permits total loss of quality if you so wished, or dropping of all aux tags. It's rather course and doesn't allow a lot of selection in tag types though. As this is a CRAM decode option it only works when reading CRAM. If you skip a data type it fills it out with the SAM default (so "*", 0, etc). Conversely the lossy names field is an output option which is used to discard query names when the read-pair are both within the same cram slice.

If you want more control over loosing tags and per-site degredation of qualities then maybe try Crumble. https://github.com/jkbonfield/crumble That too sadly needs documenting, or indeed writing up as a paper, but it's a matter of finding the time! (As always.)

0
Entering edit mode

I'm going to set this as the accepted answer, I always forget that there are actual man pages for samtools!

2
Entering edit mode
4.5 years ago

To my knowledge (there's basically no documentation that I've seen), the only way to get a lossy behaviour is by specifying --output-fmt-option lossy_names. You can do something with the auxiliary tags too I think (the required_tags option), but I've never looked into how that actually works.