Data normalization
1
1
Entering edit mode
3.0 years ago
nolwenn ▴ 10

Hello,

I'm new in bioinformatic and I would like to normalize HTseqcount data to do a survival analysis. How can I choose the best normalization ?

I tried to use DESeq2 with the median ratio method and normTransform but all medians are not align ...

DESeq_object <- estimateSizeFactors(DESeq_object)
counts_normalized <- counts(DESeq_object, normalized = TRUE)
boxplot(log2(counts_normalized + 1), main = "Counts normalized + log2 transformation")

boxplot

I tried this too :

DESeq_object <- DESeqDataSetFromMatrix(countData = count,
                                colData = coldata,
                                design= ~ gender)
vst_object <- varianceStabilizingTransformation(DESeq_object)
boxplot(assay(vst_object))

boxplot

How can I align all medians to have a good normalization ?

Thank you for your help !!

normalization RNA-seq • 2.2k views
ADD COMMENT
2
Entering edit mode

If you want equal distribution you would need to do something like quantile normalization, see for example the implementation in https://bioconductor.org/packages/release/bioc/html/preprocessCore.html

From my understanding that is quite a strong data manipulation though, why do you think that medians need to precisely align? If you have a heterogeneous sample population then one would probably expect quite some changes in expression profiles throughout the cohort, so medians (I guess) would not be expected to align, even after normalization. Why not just sticking with the well-tested vst? Maybe see whether prefiltering the data helps to get medians a bit closer, e.g. dds[rowSums(counts(dds) > 10) > 3,] before running vst.

ADD REPLY
0
Entering edit mode

I don't know the normalization data for RNA-seq and a biologist advised me to have identical medians. How can I know if the median ratio method or vst or other method are best for the normalization ? vst don't use normalized data ?

ADD REPLY
1
Entering edit mode

Agree with ATpoint, you may use assay(vst_object) to do survival analysis. Also voom function from the limma package can do the job. Here is a blogpost on RNA-seq survival analysis by using voom function.

ADD REPLY
0
Entering edit mode

Thank you for your help !

ADD REPLY
0
Entering edit mode

Cross-posted on Bioconductor, where user was already provided an answer: https://support.bioconductor.org/p/9136130/

ADD REPLY
1
Entering edit mode
3.0 years ago

It sounds like your biologist is used to seeing RMA normalized expression array data, which have identical median values across arrays. RNA-seq should not be subjected to similar normalization methods as the assumptions for such data are not the same. Analyze the data with any of the well-regarded and thoroughly vetted differential expression packages (e.g. DESeq2, edgeR, limma) and no reviewer will complain.

ADD COMMENT
0
Entering edit mode

Thank you for your help ! I understand now.

ADD REPLY

Login before adding your answer.

Traffic: 1685 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6