I am trying to convert a sam file to bam using the line of code described here, but samtools is throwing the following error at me:
samtools view -bT reference.fasta sequences.sam > sequences.bam [E::sam_parse1] CIGAR and query sequence are of different length [W::sam_read1] parse error at line 102 [main_samview] truncated file.
The offending line looks like this:
SRR808297.2571281 99 gi|309056|gb|L20934.1|MSQMTCG 747 80 101M = 790 142 TTGGTATAAAATTTAATAATCCCTTATTAATTAATAAACTTCGGCTTCCTATTCGTTCATAAGAAATATTAGCTAAACAAAATAAACCAGAAGAACAT @@CFDDFD?HFDHIGEGGIEEJIIJJIIJIGIDGIGDCHJJCHIGIJIJIIJJGIGHIGICHIICGAHDGEGGGGACGHHGEEEFDC@=?CACC>CCC NM:i:2 MD:Z:98A1A
I have seen this error reported elsewhere, it seems to arise from a bug in samtools and solutions don't seem to be universal. In my case, I am not sure which is my source of error, but I have a likely culprit: This sequence seems to be two bases shorter than the rest of the sequences with the same CIGAR. Could this be the error? Is that is so, how can I solve it? Bear in mind that my file contains paired end reads. Furthermore, I have several more files producing the same error and have to automatize a solution in a fast an efficient script.