Question: How to normalize GTEX gene counts with DESeq2?
gravatar for kakukeshi
12 months ago by
kakukeshi50 wrote:


I want to normalize the gene counts from GTEX using variance stabilizing transformation (VST) but I'm confused about which variables I should include in the "design" when creating DESeqDataSet. For the moment I'm doing the following:

dds <- DESeqDataSetFromMatrix(countData = gtex,
                              colData = sampledata,
                              design = ~ tissue) #generate the deseq data set

dds <- dds[ rowSums(counts(dds)) > 1, ] #remove genes with zero counts

vsd <- vst(dds, blind = FALSE) #normalization considering tissue

However, this just considers the different tissues during the normalization. My question is should I do it like this and include all the tissues or do it for each tissue and use something like ~ 1? should I include other variables like the experimental batch or Post-mortem interval (PMI)?

Many thanks

rna-seq • 497 views
ADD COMMENTlink modified 11 months ago by Biostar ♦♦ 20 • written 12 months ago by kakukeshi50

When you set blind = TRUE, I'm pretty sure you are not considering the different tissues.

ADD REPLYlink written 12 months ago by swbarnes27.5k

oops! its corrected now

ADD REPLYlink written 12 months ago by kakukeshi50
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe56k
Kevin Blighe56k wrote:

What you use as the design formula will depend, in part, on your end goals: are you aiming to perform differential expression analysis (DEA) across the GTEx tissues or do you just want to normalise and transform the data for other downstream tools? For DEA, obviously you have to include your condition of interest in the formula.

In the past, I input GTEx raw count data for just a single cancer type to DESeq2 but was not interested in any DEA. I therefore just used the intercept-only formula: ~ 1.

You could include tissue, if you wish, and also other factors that you believe may bias the counts. With blind = FALSE for rlog() or vst(), as swbarnes2 implies, the transformation will then 'see' the design formula and some adjustment based on this will be made when the transformation is made.

ADD COMMENTlink modified 12 months ago • written 12 months ago by Kevin Blighe56k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1354 users visited in the last hour