Question: HTseq code error repeatedly
gravatar for bnayer26
9 days ago by
bnayer260 wrote:

Hi, I am really new to bioinformatics so please help me figure out this error. I have tried looking at other threads with similar questions but couldn't resolve my problem.

I get the following error when I am using HTseq for counting:

Error occured when reading beginning of SAM/BAM file.
('SAM line does not contain at least 11 tab-delimited fields.',
'line 1 of file Sorted_KKUGCTM5_-VE_14-4_aligned.bam')
[Exception type: ValueError, raised in _HTSeq.pyx:1276]

The code I am using is as follows:

/HTSeq-0.6.1/scripts/htseq-count --stranded=no Sorted_KKUGCTM5_-VE_14-4_aligned.bam GRCh38/Homo_sapiens.GRCh38.90.gtf

I have tried using the above code, then I also tried converting the above bam file to sam file, and then used the sam file in the above-mentioned code, and I still get the following error:

Warning: Malformed SAM line: MRNM != '*' although flag bit &0x0008 set
Warning: Read 700463F:369:CB93CANXX:5:1103:15415:8299 claims to have an aligned mate which could not be found in an adjacent line.
Error occured when processing SAM input (line 57 of file Sorted_KKUGCTM5_-VE_14-4_aligned1.sam):
  'pair_alignments' needs a sequence of paired-end alignments
 [Exception type: ValueError, raised in]

Please help where am I going wrong?

rna-seq htseq • 114 views
ADD COMMENTlink modified 8 days ago by genomax62k • written 9 days ago by bnayer260
gravatar for michael.ante
9 days ago by
michael.ante3.0k wrote:

Hi, Please use the coding button to format your text. This will help others to understand your problem more easily.

To my understanding, htseq is assuming that your bam file is sorted a certain way. This assumption is violated by your file.

You can use the parameter -r pos if your reads are positionally sorted -r name otherwise.

Cheers, Michael

ADD COMMENTlink written 9 days ago by michael.ante3.0k

Thanks for pointing out how I should format my questions, I have done that now. With regards to the sorting by position or name, I believe the conversion of sam file to bam file and its subsequent sorting using the samtools code leads to sorting by name by default, so I am assuming mine is sorted by name because I continued with default parameters itself. Then in the Htseq website said that the default in Htseq is also by name so I didn't add that argument in my code. I still tried it now but the same error is coming up :(

ADD REPLYlink written 9 days ago by bnayer260

subsequent sorting using the samtools code leads to sorting by name

No. Default is co-ordinate sort. Here is samtools sort help.

Sort alignments by leftmost coordinates, or by read name when -n is used. An appropriate @HD-SO sort order header tag will be added or an existing one updated if necessary.

Can you show what samtools view -H your.bam | head -5 looks like?

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax62k
@HD VN:1.0  SO:queryname
@SQ SN:chr1 LN:248956422
@SQ SN:chr10    LN:133797422
@SQ SN:chr11    LN:135086622
@SQ SN:chr12    LN:133275309

this is what it looks like. is this the expected outcome? Thanks. I have added the <-n> in my code as well. The above output is what it looks like after I ran the code with <-n>

ADD REPLYlink modified 6 days ago • written 6 days ago by bnayer260

Looks like you file is sorted based on the sequence headers (names). So you will need to use -r name option as noted above by @michael.ante with htseq.

ADD REPLYlink written 6 days ago by genomax62k

Also note that the default input format for htseq-count is sam, which is why the software yelled at you for not providing the right format. Rather than converting your data to .sam, you should tell htseq-count to expect .bam format.

ADD REPLYlink written 8 days ago by swbarnes24.8k

What is the parameter I should introduce to specify that it is a bam file in the input? Shall I add -f bam in my HTseq code?

ADD REPLYlink modified 6 days ago • written 6 days ago by bnayer260
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2041 users visited in the last hour