Alignment records alter after bam -> cram -> bam conversion using samtools
0
0
Entering edit mode
7.0 years ago

Hi all, I am trying to take a bam file, convert it to cram, and then back again to bam and have the bam before and after conversion to be identical. Using samtools for the conversions:

samtools view -C -T ref.fa seqs.bam > seqs.cram
samtools view -b -T ref.fa seqs.cram > seqs.bam

When analysing the bam before and after, there are differences. The headers differ (md5 etc), which are of no concern to me, but the records actually change. Here I have isolated the differences for a record in the before and after bam files:

MD:Z:19 NM:i:0 (before)

MD:Z:18N0 NM:i:1 (after)

Is anybody familiar with this stuff? is it possible to have the bam file after conversion to cram and back to bam identical to the bam file in the beginning?

Cheers,

samtools cram • 2.5k views
ADD COMMENT
2
Entering edit mode

Can you post the whole read for before and after cases. I did the same conversion on my data, and although the order of tags are different, but the information looks to be the same

ADD REPLY
0
Entering edit mode

Most records are the same. Here is the whole record from where I drew the example in the original question. before:

ST-E00211:42:H2F33ALXX:4:2104:2228:68325    1097    21  48119838    0   34M116S =   48119838    0   TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGTTAGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN  =====?>?>=>=>:<=>>>=>??:6,:=*>>+33)-<+>6##############################################################################################################  BD:Z:JJNMLOMLOMIMJJMLHLJJMLHLJJMLHLJJJHHLJJMJIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIJJJJJJJJJJJJJJJJJKKKKKLLLMMNJJJJ MD:Z:34 RG:Z:CS345262   BI:Z:LLNNLPMNNNJNKMMMJNKMMMJNKMMMJNKMKJJNKMMLKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLMMMMMMMMMMMMMNNNNOOOPPPPLLLL NM:i:0  AS:i:34 MS:i:0  XS:i:34

after:

ST-E00211:42:H2F33ALXX:4:2104:2228:68325    1097    21  48119838    0   34M116S =   48119838    0   TAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGTTAGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN  =====?>?>=>=>:<=>>>=>??:6,:=*>>+33)-<+>6##############################################################################################################  BD:Z:JJNMLOMLOMIMJJMLHLJJMLHLJJMLHLJJJHHLJJMJIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIJJJJJJJJJJJJJJJJJKKKKKLLLMMNJJJJ BI:Z:LLNNLPMNNNJNKMMMJNKMMMJNKMMMJNKMKJJNKMMLKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLMMMMMMMMMMMMMNNNNOOOPPPPLLLL AS:i:34 MS:i:0  XS:i:34 MD:Z:34 NM:i:0  RG:Z:CS345262
ADD REPLY
0
Entering edit mode

The reported reads have same MD and NM (and other tags), before and after! So which reads have problems?

ADD REPLY
0
Entering edit mode

Pasted the wrong one before. This one has different NM and MD

ST-E00211:42:H2F33ALXX:4:1201:2685:31951    161 21  48119878    0   19M131S X   155260222   0   TTAGGGTTAGGGTTAGGGANNGGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN  ><<==57;>>>>,*;:==,##>8=##############################################################################################################################  MC:i:155260303  BD:Z:JJKNPKOLLNMIMJJMLHIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJKKKKKKKKLLLLMMLLJJJJ MD:Z:19 RG:Z:CS345262BI:Z:LLNNOLPLNNMJNKMMMJLKKKKJKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLMMMMMMMMMMMMNNNNNNNMLKKLLLL    NM:i:0  MQ:i:8  AS:i:19 MS:i:2846   XS:i:19

after:

ST-E00211:42:H2F33ALXX:4:1201:2685:31951    161 21  48119878    0   19M131S X   155260222   0   TTAGGGTTAGGGTTAGGGANNGGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN  ><<==57;>>>>,*;:==,##>8=##############################################################################################################################  MC:i:155260303  BD:Z:JJKNPKOLLNMIMJJMLHIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJKKKKKKKKLLLLMMLLJJJJ BI:Z:LLNNOLPLNNMJNKMMMJLKKKKJKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLMMMMMMMMMMMMNNNNNNNMLKKLLLL MQ:i:8  AS:i:19 MS:i:2846   XS:i:19 MD:Z:18N0   NM:i:1  RG:Z:CS345262
ADD REPLY
0
Entering edit mode

Interesting. If you make a BAM file with just that read and do the BAM -> CRAM ->BAM conversion again can you see if this still happens? If so, please post this as an issue on the samtools (or htslib) github repositories. You can then attach the BAM file.

ADD REPLY
0
Entering edit mode

Are you sure you are not doing some other processing of the "earlier" bam before converting to cram?

ADD REPLY
0
Entering edit mode
  1. What version of samtools? If you're not using 1.4 then redo this with that.
  2. I wouldn't be surprised if some of the auxiliary tags differ/get dropped. There's been sporadic discussion on the samtools-dev list and github to that regard. I don't recall seeing any of it lately, so hopefully that's all fixed in 1.4.
ADD REPLY
0
Entering edit mode

I did indeed use samtools 1.4.

ADD REPLY

Login before adding your answer.

Traffic: 2707 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6