Best Practices for CRAM <-> BAM
0
1
Entering edit mode
12 months ago
DavidStreid ▴ 90

Hi,

I am looking for advice about transitioning from bam/bai to cram for archival purposes. General advice is appreciated, but I'm specifically looking for answers to these two questions -

  • Does samtools offer the best performance for converting to and from CRAMs?
  • Do people have to re-index their BAM after converting from CRAM?
# 1) Convert BAM to CRAM
samtools view -T ${FA_REF} -C -o ${cram} ${bam}

# 2) Store CRAM & discard *.bam/*.bai

# 3) Retrieve CRAM from storage and convert to BAM
samtools view -T ${FA_REF} -b -o ${bam} ${cram} 

# 4) Re-index BAM to include CRAM-added headers (M5/UR)
sambamba index ${bam}

The re-indexing of the BAM rather than storing and re-using the original BAM's index seems like something I should be able to avoid somehow, but htsjdk is unable to read the new BAM with the old index. I've looked into samtools calmd to add the M5 & UR headers to the BAM before converting to CRAM, but it seems much faster to just re-index with sambamba.

Just as a note, the reasons I'm focused on just getting functionally-equivalent BAMs instead of using CRAMs or trying to get identical BAMs are -

  • previous posts indicate md5sum-identical BAMs aren't possible
  • this GATK post advises that using CRAMs directly in pipelines can cause slowdown

If people have experience with either of these assumptions being incorrect, please let me know. Thanks in advance!

sambamba bam cram samtools • 732 views
ADD COMMENT

Login before adding your answer.

Traffic: 1748 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6