Question: TMM normalization factors in RNA-seq analysis
1
gravatar for sarahmanderni
2.3 years ago by
sarahmanderni70 wrote:

Hi,

To my understanding, the main aim in TMM normalization is to account for library size variation between samples of interest. I have a simulated RNA-seq data with equal library sizes for all samples. I ran TMM normalization and I expected to find all normalization factors (from calcNormFactors() function) equal to one. However, the factors vary from 0.4 to 2.4 (with median of 1 of course) and this is not what I expect. Have I misunderstood something here? Another question is can I use TMM normalization for non-binomial values? for instance over TPM values?

Thanks in advance!

rna-seq tmm normalization • 11k views
ADD COMMENTlink modified 2.3 years ago by h.mon28k • written 2.3 years ago by sarahmanderni70
2

Exactly how did you simulate the data. TMM is a robust measure, so if you produced very different distributions of reads then that'd be the cause.

ADD REPLYlink written 2.3 years ago by Devon Ryan93k

To my knowledge TMM is supposed to correct mostly for composition bias (as well as library size). If you generated samples with different compositions then it's correct that the normalisation factors would vary.

ADD REPLYlink written 2.3 years ago by James Ashmore2.7k

I have nt produced the data myself; but yes the distribution of the reads vary significantly. Can you elaborate a little more what do you mean by robust measure and in what way the distribution affects?

ADD REPLYlink written 2.3 years ago by sarahmanderni70
4
gravatar for h.mon
2.3 years ago by
h.mon28k
Brazil
h.mon28k wrote:

The main aim in TMM normalization is to account for library size variation between samples of interest, accounting for the fact that some extremely differentially expressed genes would impact negatively the normalization procedure - or as Devon Ryan said, it is a robust normalization. How does it achieve its robustness? From the paper:

A trimmed mean is the average after removing the upper and lower x% of the data.

So an assumption of TMM is the majority of the genes are not differentially expressed. And as Devon pointed, different distributions of gene expression will result in different TMM normalizations.

ADD COMMENTlink modified 2.2 years ago • written 2.3 years ago by h.mon28k

Makes sense. Will check the paper again, thanks.

ADD REPLYlink written 2.3 years ago by sarahmanderni70

Do you have experience of applying it over TPM values?

ADD REPLYlink written 2.3 years ago by sarahmanderni70
1

I have none, but it seems you can do it (yes, you can).

ADD REPLYlink written 2.3 years ago by h.mon28k

I am also confused about normalization and statistics behind DE programs, I am using edgeR to analize two condittions.

Example for a gene ( raw-counts) four replicates by condition control (C) tratmeat (T) of a gene:

gene= FBgn0034710

Controles = 820 1618 1728 1007

Tratamientos= 7195 1252 1312 1291

Result of edgeR

logFC =1.10
logCPM = 6.5 LR = 9.77 PValue = 0.0017
FDR= 0.02

Why FBgn0034710 gene is statistically significant if one replicate (7195) has a lot of raw count in comparation with the others. I know that library size could be a factor but this is similar in the other replicates

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by vm.higareda20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1139 users visited in the last hour