Surrogate variables in eQTL studies
0
1
Entering edit mode
3.7 years ago
rodd ▴ 230

Hi folks!

I've got two questions about including surrogate variables in expression analyses. In the GTEx documentation, they say:

"A set of covariates identified using the Probabilistic Estimation of Expression Residuals (PEER) method (Stegle et al., PLoS Comp. Biol., 2010 ), calculated for the normalized expression matrices (described below). For eQTL analyses, the number of PEER factors was determined as function of sample size (N): 15 factors for N<150, 30 factors for 150≤ N<250, 45 factors for 250≤ N<350, and 60 factors for N≥350, as a result of optimizing for the number of eGenes discovered. For sQTL analyses, 15 PEER factors were computed for each tissue."

Source: https://www.gtexportal.org/home/documentationPage#staticTextAnalysisMethods

What I am wondering is: isn't it excessive to include 60 PEER factors (covariates), even if sample size is > 350?

My second question (not related to PEER but to surrogate variables in general) is.. I ran sva's num.sv function to estimate how many surrogate variables I should include in my analyses (to see if it would be any value near what was estimated by GTEx), based on my model of interest and data. Using the leek model, it's coming up with 573, whereas with the 'be' model, only 1. So even though I've got 500+ samples, I think I'm including only 1, as the be model supports this. Do you think this is sensible?

library(sva)
library(DESeq2)

edata <- counts(dds, normalized=TRUE)  # get normalized counts from DESeq2's dds object to pass to sva

mod1 <- model.matrix(~ Institution + RIN + Sex + PMI_hrs + Age_bins
+ PC1 + PC2 + PC3 + PC4 + PC5 + Disease_status, data=pd) 

mod0 = model.matrix(~ 1, data=pd) # null model

n.sv = num.sv(edata,mod1,method="leek")  # 573
n.sv = num.sv(edata,mod1,method="be") # 1
eqtl gtex covariates deseq2 sva • 1.4k views
ADD COMMENT
0
Entering edit mode

What are PC1-5 ?

ADD REPLY
0
Entering edit mode

Hi Asaf! Sorry I didn't clarify: those would be the first 5 population covariates calculated in plink.

ADD REPLY
0
Entering edit mode

The discrepancy between be and leek is, on face-value, worrying. However, i am not too familiar with PEER's usage. You could try to contact Oliver [Stegle]. I presented at a conference with him in Milan (Italy) back in 2014 and he is good natured.

ADD REPLY

Login before adding your answer.

Traffic: 2177 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6