Malformed SAM line error with HTSeq
1
0
Entering edit mode
8.2 years ago
Kssr ▴ 110

I'm running HTSeq to get counts on a sam file following the below steps,

samtools sort -n tmp.bam nameSrt
samtools fixmate nameSrt.bam nameSrt.fixmate.bam
samtools view -h  nameSrt.fixmate.bam >nameSrt.fixmate.sam

python -m HTSeq.scripts.count -f sam -r name -s $ss_library -a 10 -t exon -i $feature -m union nameSrt.fixmate.sam tmp.gff > tmp.HTSeq.counts

However, I get the below error:

Error occured when processing SAM input (line 77320836 of file nameSrt.fixmate.sam):
 ("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 77320836 of file nameSrt.fixmate.sam')
 [Exception type: ValueError, raised in _HTSeq.pyx:1323]

77320836 is the last line in the file, and it doesn't have MRNM set to '*'. Here are the last 4 lines from the SAM file:

HISEQ:512:C8CNYACXX:6:2316:21397:43838    141    *    0    0    *    *    0    0    GCAGAGAGCCTACCTGGATTGCACGTGCGTGGAGTGGTTCTGCAGATACCTGGAGAATGGGAAGGTGAAGTTGCNACGCACGGAAGCACCCAAGGGAAAT    ==<AAA7A;>AA+77@BA7+3+,2<<?1?AA61?:>A080*=?==*)79)/=A(=7=7)=>))5;))).).665#(,((,,3;',)8(((+((((((+((    RG:Z:160108_SN172_0512_BC8CNYACXX_CTTGTA_L006
HISEQ:512:C8CNYACXX:6:2316:21397:70573    409    12    125396664    3    100M    *    0    0    NCGGCTCCACTTCGAGAGTGATGGTNTTACCAGTCAGGGTCTTCACGAAGATCTGCATCCCACCTCTAAGACGGAGCACCAGGTGCAGGGTGGACTCTTT    <<8+(:C><895@:>@@;>>;,,,(#;B>FFDDCA;FIGEHCC?IGGHGHCCFD;IGBGGDDHHGFCIFGIGHHHFHFFFE9IGGG@HFHFDDFFDD@@B    CC:Z:=    MD:Z:0T24C74    XG:i:0    NH:i:2    HI:i:0    NM:i:2    XM:i:2    XN:i:0    XO:i:0    CP:i:125396892    AS:i:-2    XS:A:-    YT:Z:UU    RG:Z:160108_SN172_0512_BC8CNYACXX_CTTGTA_L006
HISEQ:512:C8CNYACXX:6:2316:21397:70573    101    12    125396892    0    *    =    125396892    0    NGGATGCCTTCCTTGTCTTGGATCTTTGCCTTGACATTCTCAATGGTGTCACTCGGCTCCACTTCGAGAGTGATGGTCTNNANAGTCAGGGTCTTCACGN    #1+=ADDDHHDDFHIAHBHHIFHDHI,AAEHIIGFEEG?D:C@?GGBGH?FEF9DF(?<B838C@HFG(;DE?ECEH7;##(#(,5;=CCB?C?@>>@B1    RG:Z:160108_SN172_0512_BC8CNYACXX_CTTGTA_L006    MQ:i:3
HISEQ:512:C8CNYACXX:6:2316:21397:70573    1177    12    125396892    3    100M    =    125396892    0    NCGGCTCCACTTCGAGAGTGATGGTNTTACCAGTCAGGGTCTTCACGAAGATCTGCATCCCACCTCTAAGACGGAGCACCAGGTGCAGGGTGGACTCTTT    <<8+(:C><895@:>@@;>>;,,,(#;B>FFDDCA;FIGEHCC?IGGHGHCCFD;IGBGGDDHHGFCIFGIGHHHFHFFFE9IGGG@HFHFDDFFDD@@B    MD:Z:0T24C74    XG:i:0    NH:i:2    HI:i:1    NM:i:2    XM:i:2    XN:i:0    XO:i:0    AS:i:-2    XS:A:-    YT:Z:UU    RG:Z:160108_SN172_0512_BC8CNYACXX_CTTGTA_L006

I'm trying to understand SAM format better but I still can't figure out a solution in this case. Appreciate any help to solve this.

htseq-count • 2.5k views
ADD COMMENT
0
Entering edit mode

Check previous posts to see if there is any solution given already:

ADD REPLY
0
Entering edit mode
8.2 years ago
Kssr ▴ 110

I ran the step below to keep only alignments mapped in proper pair and ran HTSeq, it runs fine now without any errors. Looks like the malformed SAM line was filtered out in the process.

samtools view -h -f 0x2 nameSrt.fixmate.bam
ADD COMMENT

Login before adding your answer.

Traffic: 2067 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6