Question: htseq-count for RNA Seq
2
gravatar for Ron
3.9 years ago by
Ron820
United States
Ron820 wrote:

Hi all,

I am using htseq-count for getting the read count of my genes from RNASeq data.While I have got the read counts but I am getting the warning below:

Warning: 35062120 reads with missing mate encountered.
35138401 SAM alignment pairs processed.

I am guessing this could be because of how I sorted my sam file.I did not specify any parameters while sorting.This was the command I used(gnu parallel). {1} is used to specify my input files.

 

samtools sort - -m 40G {1/.}.sorted

My question is do I have to sort by any specific parameter such as -n ?For example do I need to rerun sorting and then produce results of htseq-count.

Thanks in advance.

Ron 

rna-seq next-gen htseq-count • 6.1k views
ADD COMMENTlink modified 3.9 years ago by geek_y8.8k • written 3.9 years ago by Ron820
1

htseq-count requires it sorted but it can be either name or alignment position. You can ignore the warnings. Use -q to suppress warnings.

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by komal.rathi3.3k

So,can i just ignore the warning?

ADD REPLYlink written 3.9 years ago by Ron820

To be honest, I have used sorted as well as unsorted files in htseq-count and it gave me the same warnings. But either way it works fine. You can just ignore the warnings.

ADD REPLYlink written 3.9 years ago by komal.rathi3.3k

Okay.What is the default sort if you dont specify any parameter?Is it by name or position

ADD REPLYlink written 3.9 years ago by Ron820

name. Also read htseq-count for pair-end RNA-seq post.

ADD REPLYlink written 3.9 years ago by komal.rathi3.3k
1

I think the default sorting is by position not by name

ADD REPLYlink written 3.9 years ago by Martombo2.3k
1

Actually you are right. Samtools manual says it is position, and htseq-count says it is by name. It's confusing.

From htseq-count manual:

"For paired-end data, the alignment have to be sorted either by read name or by alignment position. If your data is not sorted, use the samtools sort function of samtools to sort it. Use this option, with name or pos for <order> to indicate how the input data has been sorted. The default is name.

If name is indicated, htseq-count expects all the alignments for the reads of a given read pair to appear in adjacent records in the input data. For pos, this is not expected; rather, read alignments whose mate alignment have not yet been seen are kept in a buffer in memory until the mate is found. While, strictly speaking, the latter will also work with unsorted data, sorting ensures that most alignment mates appear close to each other in the data and hence the buffer is much less likely to overflow."

From samtools specification:

"Sort alignments by leftmost coordinates"

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by komal.rathi3.3k

what version of htseq are you using? it's true that it can work also with coordinate sorted bam files, but only the latest versions have that option.

my suggestion is that you try to sort it by name and see if you get the same warnings. I wouldn't be confident in simply ignoring them 

ADD REPLYlink written 3.9 years ago by Martombo2.3k

I am using HTSeq-0.6.1,its the latest version.What option can I specify for coordinated sorted bam files?

ADD REPLYlink written 3.9 years ago by Ron820
7
gravatar for geek_y
3.9 years ago by
geek_y8.8k
geek_y8.8k wrote:

First check how many reads have proper mates. May be using samtools flagstat

If you find many reads with missing mates, like shown in htseq warning, there are no issues with htseq. If you find most of the reads have their mates aligned, then

Sort the samfile either by position ( default ) or name ( -n ). Then use latest version of htseq and set the -r option to eithe name ( -r name [default] ) or position ( -r pos ) based on how you have sorted the bam file. Make sure you have used the strandedness (-s) option appropriately.

ADD COMMENTlink written 3.9 years ago by geek_y8.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1600 users visited in the last hour