Question: Why Does The Htseq Program Indicate A Malformed Sam File With Error Mrnm==*
0
gravatar for joaslucas
6.4 years ago by
joaslucas70
joaslucas70 wrote:

Dear Fellows,

I have RNA-seq from mycobacteria, I got rid of rRNA, and finally run it against the genome of Mycobacterium tuberculosis using SOAP Aligner. I converted to SAM the aligned output file. When I try to use HTSeq count to get read counts I get the following problem:

Lucy@Lucy:~/Documents/programs$ htseq-count -m intersection-nonempty -s no -t gene -i ID -o /home/Lucy/Documents/FOR_SOAP/S2_samout /home/Lucy/Documents/FOR_SOAP/S2_merged/mapped_MTB/O2_S2_MTB.sam /home/Lucy/Documents/GFF_FILES/MTB_transcripts.gff3
23962 GFF lines processed.
Error occured when reading first line of sam file.
Error: ("Malformed SAM line: MRNM == '*' although flag bit &0x0008 cleared", 'line 1 of file /home/joas/Documents/FOR_SOAP/S2_merged/mapped_MTB/O2_S2_MTB.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:1321]

Please, help me to find a solution. P.S I am a biologist and have just started working with RNA-seq

Thanks

htseq gene sam counts • 3.3k views
ADD COMMENTlink modified 6.4 years ago • written 6.4 years ago by joaslucas70
0
gravatar for Istvan Albert
6.4 years ago by
Istvan Albert ♦♦ 81k
University Park, USA
Istvan Albert ♦♦ 81k wrote:

This is not a solution as much as an explanation of what you see:

The 0x0008 flag indicates whether the mate of a paired end read is mapped or not. The MRNM is the 7th column of a SAM file also known as RNEXT and is supposed to contain the name of the mate pair read. It may contain the = sign to indicate the same name or * to indicate that the information is unavailable.

In your case it seems that you have a SAM file that indicates that the read is paired, "clears the flag" yet does not contain a mate information. This is an error or an implementation oversight and could happen if the aligner does not adhere strictly to the standard. It is a pretty substantial oversight though as it would preclude you from visualizing mapped read pairs and perform a number of other analyses.

One solution could be to use a different aligner that behaves a little better, perhaps an updated version of SOAPaligner or bwa or similar tools.

ADD COMMENTlink written 6.4 years ago by Istvan Albert ♦♦ 81k
0
gravatar for joaslucas
6.4 years ago by
joaslucas70
joaslucas70 wrote:

Thanks, I will try bwa.

ADD COMMENTlink written 6.4 years ago by joaslucas70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1365 users visited in the last hour