CIGAR and sequence length are inconsistent after HISAT alignment
1
0
Entering edit mode
5.4 years ago
DVA ▴ 630

Hello,

I am trying to understand an error when running "samtools view":

Line 6040, sequence length 67 vs 76 from CIGAR
Parse error at line 6040: CIGAR and sequence length are inconsistent

I went back to check line 6040 from sam file (generated by HISAT2):

NR500449:117:H7WMXX:2:11101:6188:2761        99      10      103287073      60       3S73M   =       103287096       102     TGGCAAGAGTGAGATGGCACGCCACCTTCGGGAATACCAGGACTTGCTCAATGTCAAAAACATTGAG     AAAAAE6AA/EE/E///EEE//AEEEE/E/////E6E/EE/E<E///E//<EEE//</E//<AE6EE/E/E/EEE/    AS:i:-3 XN:i:0  XM:i:0  XO:i:0  XG:i:0 NM:i:0   MD:Z:73 YS:i:-12        YT:Z:CP NH:i:1

And lines corresponding to this alignment in R1 and R2 from fastq files:

@NR500449:117:H7WMXX:2:11101:6188:2761 1:N:0:3
TGGCAAGAGTGAGATGGCACGCCACCTTCGGGAATACCAGGACTTGCTCAATGTCAAAATGGCTCTTGACATTGAG
+
AAAAAE6AA/EE/E///EEE//AEEEE/E/////E6E/EE/E<E///E//<EEE//</E//<AE6EE/E/E/EEE/

@NR500449:117:H7WMXX:2:11101:6188:2761 2:N:0:3
TCCTGCAGTTTCCTGTAAGCTGCTATCTCAATGTCAAGAGACATTTTGACATTGAGAAAGTCCTGGTATTACCGAA
+
A/A/A/E///EAEEEEEE//EEE/EEEEEEEEAA//A<EE//EEE/EEA//6/AE//EA/<6/EE/E<AE////<E

Indeed the CIGAR string represents 76bps, same as the seq length from fastq, but the sam line has only 67bps. Any idea about why would this happen and how to deal with it? Thank you very much.

hisat alignment rnaseq • 1.9k views
ADD COMMENT
0
Entering edit mode
5.4 years ago

Looks like a bug in hisat, please report it to the authors.

ADD COMMENT
0
Entering edit mode

Thank you for the conformation. I will update this post once I hear from them.

ADD REPLY

Login before adding your answer.

Traffic: 1628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6