Question

DESeq2 SizeFactors or NormalizationFactors . Which to choose?

2

Entering edit mode

3.6 years ago

ferran.llobet ▴ 10

Greetings!

Currently I'm conducting a RNA-Seq experiment where my goal is to perfom a Differential Expression Analysis. To do so I mapped my reads using STAR and quantified them using RSEM.

To perform said analysis I am using DESeq2. As instructed in its guide I used tximport to import and convert the decimal counts into integer counts likewise:

# Loading of the rsem counts data 
tx <- tximport(path, type ="rsem", txIn = FALSE, txOut = FALSE)

I encountered the error stated in this post and used the given workarround which did solve it.

Then I continued my analysis. To generate the dds object I used the function DESeqDataSetFromTximport. Furthermnore, I desire a FDR cutoff of .05 so I indicated it in within the results function:

# Loading of the data into the dds object
dds <- DESeqDataSetFromTximport(tx,
                              colData = Animals,
                              design = ~ Group)

# Model fitting
dds <- DESeq(dds)

# Results
res <- results(dds, alpha=0.05)

I obtained a list of 103 signifcant features. Then, I also included a filter of absolute FoldChange > 1.5 that reduced the list to 65 features.

Prior to this results (in the context of learning about DESeq2) I used the function estimateSizeFactors which generated normalizationFactors as if I understand correctly, tximport keeps the feature length information from the input RSEM data and tehrefore allows the creation of NormalizationFactors (instead of SizeFactors) which normalize for both library depth and feature length.

# I tought I was estimating SizeFactors...
dds <- estimateSizeFactors(dds)

# ... but they were empty...
head(sizeFactors(dds.BC1_PI.LV)) 

NULL

#... what I had was normalizationFactors: 
head(normalizationFactors(dds.BC1_PI.LV))

A MATRIX OF VALUES

Then after reading everywhere about SizeFactors but not about NormalizationFactors I had the idea of testing the normalization effect. First I exported the un-normalized positive-integer count matrix from the dds object:

# Stroing the counts into a CSV file
CMX_dds <- counts(dds)
write.table(CMX_dds, file = "COUNTS.csv", sep =";")

Then I re-run the analysis but departuring from this standard count matrix. In this case the normalization was done via SizeFactors as no gene length data is avaliable:

# Loading the counts matrix
cnt.BC1_PI.LV <- read.delim("COUNTS.csv", sep = ";")

# Loading of the data into the dds object
dds <- DESeqDataSetFromMatrix(countData = cnts,
                                        colData = Animals, 
                                        design = ~ Group)

# Model fitting
dds <- DESeq(dds)

# Results
res <- results(dds, alpha=0.05)

At first glance it appears that the effect of normalization is minimal, the number of singificant features increased slighty: 113 and 75 respectively.

However my trouble begins when I checked both lists and I find these results:

Number of significant features + common features between normalization considering no FC filtering (top) and FC filtering (bottom)

Arround 20% of the significant features change which for me translates into a great impact of the normalization‼️ 😰.

This difference is worrying me quite a lot and given that I was unable to find any information about NormalizationFactors I have several questions:

How does DESeq2 compute these NormalizationFactors or at least where I can find inofrmation about it?
Are there any kind of guidelines or criteria to decide which normalization to use depending on the count data that I have? In my case I have a small data set (12 samples) of stranded Ilumina RNA-Seq.
Should I stick with NormalizationFactors just because they consider more biases?
There is some statistical apporach that I could use to test the effect of normalization?

Thanks for your attention,
Ferri 🎨.

Normalization DESeq2 RNA-Seq • 1.3k views

ADD COMMENT • link 3.6 years ago by ferran.llobet ▴ 10