Data normalization
1
1
Entering edit mode
7 days ago
nolwenn ▴ 10

Hello,

I'm new in bioinformatic and I would like to normalize HTseqcount data to do a survival analysis. How can I choose the best normalization ?

I tried to use DESeq2 with the median ratio method and normTransform but all medians are not align ...

DESeq_object <- estimateSizeFactors(DESeq_object)
counts_normalized <- counts(DESeq_object, normalized = TRUE)
boxplot(log2(counts_normalized + 1), main = "Counts normalized + log2 transformation")


I tried this too :

DESeq_object <- DESeqDataSetFromMatrix(countData = count,
colData = coldata,
design= ~ gender)
vst_object <- varianceStabilizingTransformation(DESeq_object)
boxplot(assay(vst_object))


How can I align all medians to have a good normalization ?

Thank you for your help !!

normalization RNA-seq • 259 views
2
Entering edit mode

If you want equal distribution you would need to do something like quantile normalization, see for example the implementation in https://bioconductor.org/packages/release/bioc/html/preprocessCore.html

From my understanding that is quite a strong data manipulation though, why do you think that medians need to precisely align? If you have a heterogeneous sample population then one would probably expect quite some changes in expression profiles throughout the cohort, so medians (I guess) would not be expected to align, even after normalization. Why not just sticking with the well-tested vst? Maybe see whether prefiltering the data helps to get medians a bit closer, e.g. dds[rowSums(counts(dds) > 10) > 3,] before running vst.

0
Entering edit mode

I don't know the normalization data for RNA-seq and a biologist advised me to have identical medians. How can I know if the median ratio method or vst or other method are best for the normalization ? vst don't use normalized data ?

1
Entering edit mode

Agree with ATpoint, you may use assay(vst_object) to do survival analysis. Also voom function from the limma package can do the job. Here is a blogpost on RNA-seq survival analysis by using voom function.

0
Entering edit mode

Thank you for your help !

0
Entering edit mode

1
Entering edit mode
7 days ago

It sounds like your biologist is used to seeing RMA normalized expression array data, which have identical median values across arrays. RNA-seq should not be subjected to similar normalization methods as the assumptions for such data are not the same. Analyze the data with any of the well-regarded and thoroughly vetted differential expression packages (e.g. DESeq2, edgeR, limma) and no reviewer will complain.

0
Entering edit mode

Thank you for your help ! I understand now.