DESeq2 normalisation: is the size of the gene taken into account?
3
4
Entering edit mode
7.1 years ago
Aurelie MLB ▴ 360

Hello,

I do not manage to really understand if the DESeq2 normalisation and regularized log transformation are taking the size of the gene into account. Do they?

It seems to me that they are not...But I am probably missing something. Do I have a bias toward long genes when I am using DESeq to find differentially expressed genes or when I am looking at expression profiles after a  regularized log transformation ?

Many thanks

RNA-Seq • 11k views
8
Entering edit mode
7.1 years ago

No the normalization steps don't take gene size into account, since it doesn't matter. You do not have a bias toward longer genes, rather you have increased power to find changes in them given a constant expression level. This is a good thing, you do not want to try to get rid of it.

If you're doing something like GO enrichment or other downstream analyses where gene length can play a role, then you should account for it there (see, for example, the GOseq package).

0
Entering edit mode

Thank you !

0
Entering edit mode

I don't understand why in goseq they calculate the median not the sum of transcripts at section 5.3 ! Do you have any comments on that?

0
Entering edit mode

You should probably post this as a separate question.

0
Entering edit mode

Right! Do you see it as an issue so I make a separate post about it?

1
Entering edit mode

Well, its a legitimate question and unrelated to the current thread, so yes.

0
Entering edit mode

@Devon Ryan I didn't understand why you will increased power to find changes in them given a constant expression level ? Do you mean that you want to look for higher count values in longer genes across samples ? Thanks!

0
Entering edit mode

Longer genes have higher counts, so their relative expression levels across conditions is easier to measure.

7
Entering edit mode
7.1 years ago

If you are comparing the same gene among different samples, then it doesn't really matter since you will be normalizing the gene in the  different samples by the same length.

If you want to compare different genes within the same sample, then gene length would matter (DESeq2 wasn't really made to do this anyways). However, I don't think trying to compare different genes within a sample sample is valid, depending on how you arrived at your tag counts.

For example, if you only considered uniquely mapped reads in generating your tag counts, then for genes with repetitive/conserved regions, you will be artificially under-tag-counting that gene.

0
Entering edit mode

Hello,

OK thank you I realise now why the size is not important in comparisons between samples. And I can see why it is a problem to compare gene expression with a sample...

May I ask you another question then please? What I actually would like to do is to inspect the expression of all genes within a sample to see how much markers are expressed in a control sample for instance. So far, I have been using the regularised log transformation of DESeq2 on the counts and plotted the log value (y axis) versus the genes (x axis). I get from your answer that it might be misleading to do this... But would there be a better way? Would a classical log2 transformation on FPKM be better as it would at least account for the size? (and yes I did considered the uniquely mapped reads only...:(  )

0
Entering edit mode

Are you trying to assess how abundantly a gene is expressed for experimental purposes (insitu hybs, transgenic targets)? I get that question a lot from my lab mates.

It is not an exact science since the signal you will get from whatever marker you are using will depend on many different factors, of which, the abundance of expression might not play that big of a role.

What I usually end up doing for my lab mates is just rank their candidate genes by tag counts per kb and they can choose the top 10 genes or something. I don't have enough data to say whether there is some kind of correlation between tag counts per kb and marker signal.

0
Entering edit mode

Yes the purpose would be similar.

May I ask you how the tag counts per kb is different from FPKM ? apologies for any stupid question here :)

2
Entering edit mode
7.1 years ago
Michael Love ★ 2.4k
We discourage cross posting the same question on multiple sites because it duplicates everyone's effort in answering your questions. At the least, please link to the Bioc support site posts.
0
Entering edit mode

Apologies! I did not know. I posted here first and then saw that the Bioconductor support website was recommended in your documentation so I thought it would be more appropriate to post there finally. All I can say now is that it will not happen again...

0
Entering edit mode