Question: Is samtools' compression of bam to cram lossless?
1
gravatar for bjarki.sigurjons
5 months ago by
bjarki.sigurjons10 wrote:

One can use CRAMtools to convert(compress) bam into cram. CRAMtools has options to allow for lossy and non-lossy compression (--lossless-quality-score and --lossy-quality-score-spec).

Samtools is also capable of compressing bam into CRAM but with limited parameteres, one just specifies the output to be cram (using option -C). So my question is, do you know if the samtools' compression of bam into cram is lossy or lossless?

Thx

cram samtools • 302 views
ADD COMMENTlink modified 4 months ago by James Bonfield110 • written 5 months ago by bjarki.sigurjons10
2

My guess is that it will be 'lossless'. If it were lossy, it must have been documented. You can easily check it by yourself. Get a sam -> convert to bam -> cram -> sam. Then compare this sam with original sam.

ADD REPLYlink written 5 months ago by Santosh Anand2.9k
2
gravatar for James Bonfield
4 months ago by
James Bonfield110 wrote:

There is some sketchy documentation in the samtools man page (eg here: https://github.com/samtools/samtools/blob/develop/samtools.1#L1879 and further down for lossy names). These document options specified using --input-fmt-option or --output-fmt-option. Note it's not always obvious which is which without reading the man page. For example the required-field parameter is designed to avoid decoding certain components of a CRAM file, so when decoding CRAM or converting from CRAM to CRAM you can elect to simply drop certain fields. This permits total loss of quality if you so wished, or dropping of all aux tags. It's rather course and doesn't allow a lot of selection in tag types though. As this is a CRAM decode option it only works when reading CRAM. If you skip a data type it fills it out with the SAM default (so "*", 0, etc). Conversely the lossy names field is an output option which is used to discard query names when the read-pair are both within the same cram slice.

If you want more control over loosing tags and per-site degredation of qualities then maybe try Crumble. https://github.com/jkbonfield/crumble That too sadly needs documenting, or indeed writing up as a paper, but it's a matter of finding the time! (As always.)

ADD COMMENTlink written 4 months ago by James Bonfield110

I'm going to set this as the accepted answer, I always forget that there are actual man pages for samtools!

ADD REPLYlink written 4 months ago by Devon Ryan70k
1
gravatar for Devon Ryan
5 months ago by
Devon Ryan70k
Freiburg, Germany
Devon Ryan70k wrote:

To my knowledge (there's basically no documentation that I've seen), the only way to get a lossy behaviour is by specifying --output-fmt-option lossy_names. You can do something with the auxiliary tags too I think (the required_tags option), but I've never looked into how that actually works.

ADD COMMENTlink written 5 months ago by Devon Ryan70k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 542 users visited in the last hour