Question: Malformed SAM line error with HTSeq
0
gravatar for Kssr
3.2 years ago by
Kssr110
Kssr110 wrote:

I'm running HTSeq to get counts on a sam file following the below steps,

samtools sort -n tmp.bam nameSrt
samtools fixmate nameSrt.bam nameSrt.fixmate.bam
samtools view -h  nameSrt.fixmate.bam >nameSrt.fixmate.sam

python -m HTSeq.scripts.count -f sam -r name -s $ss_library -a 10 -t exon -i $feature -m union nameSrt.fixmate.sam tmp.gff > tmp.HTSeq.counts

However, I get the below error:

Error occured when processing SAM input (line 77320836 of file nameSrt.fixmate.sam):
 ("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 77320836 of file nameSrt.fixmate.sam')
 [Exception type: ValueError, raised in _HTSeq.pyx:1323]

77320836 is the last line in the file, and it doesn't have MRNM set to '*'. Here are the last 4 lines from the SAM file:

HISEQ:512:C8CNYACXX:6:2316:21397:43838    141    *    0    0    *    *    0    0    GCAGAGAGCCTACCTGGATTGCACGTGCGTGGAGTGGTTCTGCAGATACCTGGAGAATGGGAAGGTGAAGTTGCNACGCACGGAAGCACCCAAGGGAAAT    ==<AAA7A;>AA+77@BA7+3+,2<<?1?AA61?:>A080*=?==*)79)/=A(=7=7)=>))5;))).).665#(,((,,3;',)8(((+((((((+((    RG:Z:160108_SN172_0512_BC8CNYACXX_CTTGTA_L006
HISEQ:512:C8CNYACXX:6:2316:21397:70573    409    12    125396664    3    100M    *    0    0    NCGGCTCCACTTCGAGAGTGATGGTNTTACCAGTCAGGGTCTTCACGAAGATCTGCATCCCACCTCTAAGACGGAGCACCAGGTGCAGGGTGGACTCTTT    <<8+(:C><895@:>@@;>>;,,,(#;B>FFDDCA;FIGEHCC?IGGHGHCCFD;IGBGGDDHHGFCIFGIGHHHFHFFFE9IGGG@HFHFDDFFDD@@B    CC:Z:=    MD:Z:0T24C74    XG:i:0    NH:i:2    HI:i:0    NM:i:2    XM:i:2    XN:i:0    XO:i:0    CP:i:125396892    AS:i:-2    XS:A:-    YT:Z:UU    RG:Z:160108_SN172_0512_BC8CNYACXX_CTTGTA_L006
HISEQ:512:C8CNYACXX:6:2316:21397:70573    101    12    125396892    0    *    =    125396892    0    NGGATGCCTTCCTTGTCTTGGATCTTTGCCTTGACATTCTCAATGGTGTCACTCGGCTCCACTTCGAGAGTGATGGTCTNNANAGTCAGGGTCTTCACGN    #1+=ADDDHHDDFHIAHBHHIFHDHI,AAEHIIGFEEG?D:C@?GGBGH?FEF9DF(?<B838C@HFG(;DE?ECEH7;##(#(,5;=CCB?C?@>>@B1    RG:Z:160108_SN172_0512_BC8CNYACXX_CTTGTA_L006    MQ:i:3
HISEQ:512:C8CNYACXX:6:2316:21397:70573    1177    12    125396892    3    100M    =    125396892    0    NCGGCTCCACTTCGAGAGTGATGGTNTTACCAGTCAGGGTCTTCACGAAGATCTGCATCCCACCTCTAAGACGGAGCACCAGGTGCAGGGTGGACTCTTT    <<8+(:C><895@:>@@;>>;,,,(#;B>FFDDCA;FIGEHCC?IGGHGHCCFD;IGBGGDDHHGFCIFGIGHHHFHFFFE9IGGG@HFHFDDFFDD@@B    MD:Z:0T24C74    XG:i:0    NH:i:2    HI:i:1    NM:i:2    XM:i:2    XN:i:0    XO:i:0    AS:i:-2    XS:A:-    YT:Z:UU    RG:Z:160108_SN172_0512_BC8CNYACXX_CTTGTA_L006

I'm trying to understand SAM format better but I still can't figure out a solution in this case. Appreciate any help to solve this.

htseq-count • 1.2k views
ADD COMMENTlink modified 7 months ago by RamRS21k • written 3.2 years ago by Kssr110

Check previous posts to see if there is any solution given already:

ADD REPLYlink modified 7 months ago by RamRS21k • written 3.2 years ago by geek_y9.4k
0
gravatar for Kssr
3.2 years ago by
Kssr110
Kssr110 wrote:

I ran the step below to keep only alignments mapped in proper pair and ran HTSeq, it runs fine now without any errors. Looks like the malformed SAM line was filtered out in the process.

samtools view -h -f 0x2 nameSrt.fixmate.bam
ADD COMMENTlink modified 7 months ago by RamRS21k • written 3.2 years ago by Kssr110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1497 users visited in the last hour