Greetings!
Currently I'm conducting a RNA-Seq experiment where my goal is to perfom a Differential Expression Analysis.
To do so I mapped my reads using STAR
and quantified them using RSEM
.
To perform said analysis I am using DESeq2
.
As instructed in its guide I used tximport
to import and convert the decimal counts into integer counts likewise:
# Loading of the rsem counts data
tx <- tximport(path, type ="rsem", txIn = FALSE, txOut = FALSE)
I encountered the error stated in this post and used the given workarround which did solve it.
Then I continued my analysis. To generate the dds
object I used the function DESeqDataSetFromTximport
. Furthermnore, I desire a FDR cutoff of .05 so I indicated it in within the results
function:
# Loading of the data into the dds object
dds <- DESeqDataSetFromTximport(tx,
colData = Animals,
design = ~ Group)
# Model fitting
dds <- DESeq(dds)
# Results
res <- results(dds, alpha=0.05)
I obtained a list of 103 signifcant features. Then, I also included a filter of absolute FoldChange > 1.5 that reduced the list to 65 features.
Prior to this results (in the context of learning about DESeq2) I used the function estimateSizeFactors
which generated normalizationFactors
as if I understand correctly, tximport
keeps the feature length information from the input RSEM data and tehrefore allows the creation of NormalizationFactors
(instead of SizeFactors
) which normalize for both library depth and feature length.
# I tought I was estimating SizeFactors...
dds <- estimateSizeFactors(dds)
# ... but they were empty...
head(sizeFactors(dds.BC1_PI.LV))
NULL
#... what I had was normalizationFactors:
head(normalizationFactors(dds.BC1_PI.LV))
A MATRIX OF VALUES
Then after reading everywhere about SizeFactors but
not about NormalizationFactors
I had the idea of testing the normalization effect. First I exported the un-normalized positive-integer count matrix from the dds object:
# Stroing the counts into a CSV file
CMX_dds <- counts(dds)
write.table(CMX_dds, file = "COUNTS.csv", sep =";")
Then I re-run the analysis but departuring from this standard count matrix. In this case the normalization was done via SizeFactors
as no gene length data is avaliable:
# Loading the counts matrix
cnt.BC1_PI.LV <- read.delim("COUNTS.csv", sep = ";")
# Loading of the data into the dds object
dds <- DESeqDataSetFromMatrix(countData = cnts,
colData = Animals,
design = ~ Group)
# Model fitting
dds <- DESeq(dds)
# Results
res <- results(dds, alpha=0.05)
At first glance it appears that the effect of normalization is minimal, the number of singificant features increased slighty: 113 and 75 respectively.
However my trouble begins when I checked both lists and I find these results:
Number of significant features + common features between normalization considering no FC filtering (top) and FC filtering (bottom)
Arround 20% of the significant features change which for me translates into a great impact of the normalization‼️ 😰.
This difference is worrying me quite a lot and given that I was unable to find any information about NormalizationFactors
I have several questions:
How does
DESeq2
compute theseNormalizationFactors
or at least where I can find inofrmation about it?Are there any kind of guidelines or criteria to decide which normalization to use depending on the count data that I have? In my case I have a small data set (12 samples) of stranded Ilumina RNA-Seq.
Should I stick with
NormalizationFactors
just because they consider more biases?There is some statistical apporach that I could use to test the effect of normalization?
Thanks for your attention,
Ferri 🎨.