Question: How does DESeq2 handle the read-counts assigned to special counters in HTSeq-count output?
gravatar for komal.rathi
4.2 years ago by
Children's Hospital of Philadelphia, Philadelphia, PA
komal.rathi3.2k wrote:

Hi everyone, 

I have a doubt regarding how DESeq2 handles the HTSeq counts assigned to special classes when performing normalization & differential gene expression.

Apart from actual genes, HTSeq assigns reads to the five classes below. I have an approximation of the mean of the counts assigned to each class for my dataset:


  1. no_feature (~40112886)
  2. ambiguous (~9732)
  3. too_low_aQual (0)
  4. not_aligned (0)
  5. alignment_not_unique (~4294028)

Now, I am going to use DESeq2 to normalize and get differentially expressed genes for my data. So how does DESeq2 handle these classes? Does it remove them during normalization, uses them in the normalization process or do we have to remove these rows manually before the normalization? 



rna-seq deseq2 htseq-count • 5.2k views
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by komal.rathi3.2k
gravatar for Devon Ryan
4.2 years ago by
Devon Ryan81k
Freiburg, Germany
Devon Ryan81k wrote:

It removes them when making the DeseqDataSet object that's used in the subsequent steps.


Edit: If you're curious, here are the relevant lines of DESeqDataSetFromHTSeqCount:

oldSpecialNames <- c( "no_feature", "ambiguous",
                   "too_low_aQual", "not_aligned",
                   "alignment_not_unique" )
# either starts with two underscores
# or is one of the old special names (htseq-count backward compatability)
specialRows <- (substr(rownames(tbl),1,2) == "__") |
    rownames(tbl) %in% oldSpecialNames
tbl <- tbl[ !specialRows, ]

Edit2: I should mention that if you use one of the other functions, such as DESeqDataSetFromMatrix or DESeqDataSet, you need remove these fields yourself.

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Devon Ryan81k

Oh! So basically it doesn't matter if you remove them manually before parsing your data to DESeq2. I am going to combine data from multiple htseq-count runs (Protein-coding genes and long-noncoding RNAs) and normalize them together. So R converts 'no_feature' to 'no_feature1', 'no_feature2' and so on. And I will be using DESeqDataSetFromMatrix() to get my counts so I will just remove these rows manually. Thanks for the help! 

P.S.: I am using htseq-count version 0.5.4p5 and there is no '__' before these names in the count file, not that it matters because you are removing it anyway.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by komal.rathi3.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 636 users visited in the last hour