Question: Is samtools' compression of bam to cram lossless?
1
gravatar for bjarki.sigurjons
5 weeks ago by
bjarki.sigurjons10 wrote:

One can use CRAMtools to convert(compress) bam into cram. CRAMtools has options to allow for lossy and non-lossy compression (--lossless-quality-score and --lossy-quality-score-spec).

Samtools is also capable of compressing bam into CRAM but with limited parameteres, one just specifies the output to be cram (using option -C). So my question is, do you know if the samtools' compression of bam into cram is lossy or lossless?

Thx

cram samtools • 183 views
ADD COMMENTlink modified 17 days ago by James Bonfield60 • written 5 weeks ago by bjarki.sigurjons10
2

My guess is that it will be 'lossless'. If it were lossy, it must have been documented. You can easily check it by yourself. Get a sam -> convert to bam -> cram -> sam. Then compare this sam with original sam.

ADD REPLYlink written 5 weeks ago by Santosh Anand2.2k
2
gravatar for James Bonfield
17 days ago by
James Bonfield60 wrote:

There is some sketchy documentation in the samtools man page (eg here: https://github.com/samtools/samtools/blob/develop/samtools.1#L1879 and further down for lossy names). These document options specified using --input-fmt-option or --output-fmt-option. Note it's not always obvious which is which without reading the man page. For example the required-field parameter is designed to avoid decoding certain components of a CRAM file, so when decoding CRAM or converting from CRAM to CRAM you can elect to simply drop certain fields. This permits total loss of quality if you so wished, or dropping of all aux tags. It's rather course and doesn't allow a lot of selection in tag types though. As this is a CRAM decode option it only works when reading CRAM. If you skip a data type it fills it out with the SAM default (so "*", 0, etc). Conversely the lossy names field is an output option which is used to discard query names when the read-pair are both within the same cram slice.

If you want more control over loosing tags and per-site degredation of qualities then maybe try Crumble. https://github.com/jkbonfield/crumble That too sadly needs documenting, or indeed writing up as a paper, but it's a matter of finding the time! (As always.)

ADD COMMENTlink written 17 days ago by James Bonfield60

I'm going to set this as the accepted answer, I always forget that there are actual man pages for samtools!

ADD REPLYlink written 17 days ago by Devon Ryan66k
1
gravatar for Devon Ryan
5 weeks ago by
Devon Ryan66k
Freiburg, Germany
Devon Ryan66k wrote:

To my knowledge (there's basically no documentation that I've seen), the only way to get a lossy behaviour is by specifying --output-fmt-option lossy_names. You can do something with the auxiliary tags too I think (the required_tags option), but I've never looked into how that actually works.

ADD COMMENTlink written 5 weeks ago by Devon Ryan66k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 944 users visited in the last hour