Question: How to normalize GTEX gene counts with DESeq2?
0
gravatar for kakukeshi
18 months ago by
kakukeshi70
kakukeshi70 wrote:

Hi,

I want to normalize the gene counts from GTEX using variance stabilizing transformation (VST) but I'm confused about which variables I should include in the "design" when creating DESeqDataSet. For the moment I'm doing the following:

dds <- DESeqDataSetFromMatrix(countData = gtex,
                              colData = sampledata,
                              design = ~ tissue) #generate the deseq data set

dds <- dds[ rowSums(counts(dds)) > 1, ] #remove genes with zero counts

vsd <- vst(dds, blind = FALSE) #normalization considering tissue

However, this just considers the different tissues during the normalization. My question is should I do it like this and include all the tissues or do it for each tissue and use something like ~ 1? should I include other variables like the experimental batch or Post-mortem interval (PMI)?

Many thanks

rna-seq • 809 views
ADD COMMENTlink modified 17 months ago by Biostar ♦♦ 20 • written 18 months ago by kakukeshi70
1

When you set blind = TRUE, I'm pretty sure you are not considering the different tissues.

ADD REPLYlink written 18 months ago by swbarnes28.6k

oops! its corrected now

ADD REPLYlink written 18 months ago by kakukeshi70
2
gravatar for Kevin Blighe
18 months ago by
Kevin Blighe65k
Kevin Blighe65k wrote:

What you use as the design formula will depend, in part, on your end goals: are you aiming to perform differential expression analysis (DEA) across the GTEx tissues or do you just want to normalise and transform the data for other downstream tools? For DEA, obviously you have to include your condition of interest in the formula.

In the past, I input GTEx raw count data for just a single cancer type to DESeq2 but was not interested in any DEA. I therefore just used the intercept-only formula: ~ 1.

You could include tissue, if you wish, and also other factors that you believe may bias the counts. With blind = FALSE for rlog() or vst(), as swbarnes2 implies, the transformation will then 'see' the design formula ( see here: http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#blind-dispersion-estimation )

ADD COMMENTlink modified 29 days ago • written 18 months ago by Kevin Blighe65k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1366 users visited in the last hour