Question

Covariates for eQTL analysis

1

Entering edit mode

3.8 years ago

Colari19 ▴ 90

Hello,

I understand that an important aspect of eQTL analysis is accounting for confounding variation in the expression data, as many factors can affect gene expression, including sex, age, lifestyle factors etc. Including these factors as covariates in the model therefore increases the chances of identifying genuine eQTL effects.

I have RNASeq data for approximately 100 individuals, along with corresponding genotype data for approx 600,000 SNPs. I'd like to use MatrixeQTL to do an eQTL analysis using this data. I'm also in the fortunate position of having a lot of clinical data for these subjects, describing a variety of things, including the usual suspects: age, gender, ethnicity etc. All in all the clinical data comprises 200 fields. However, as is usually the case with clinical data, much of it is incomplete (i.e. contains NAs).

Any number of these fields could be influencing gene expression, but i'm unsure how best to account for them in the analysis. Do I include all them as covariates? Only the "obvious" ones (age, sex etc)? Or is a better approach to use principle components of the expression data as covariates?

Any guidance would be appreciated.

eQTL MatrixeQTL • 2.3k views

ADD COMMENT • link updated 2.5 years ago by IZT • 0 • written 3.8 years ago by Colari19 ▴ 90

0

Entering edit mode

Hello, I am currently a student (so a new in this field) and read your post and if it is possible, I would like to ask how can data of eQTL paired with information about the subject's sex, age, lifestyle factors etc be obtained? The only information I can currently download is that of the eQTL with the p-value, NES etc. Thank you very much in advance!

ADD REPLY • link 2.5 years ago by IZT • 0

score 5 · Accepted Answer · 2020-06-26

5

Entering edit mode

3.8 years ago

Floris Brenk ★ 1.0k

Dont think there is a golden standard approach here, much will depend on what you research question is? Couple things you can do:

1) impute your clinical data to fill in the NA

2) make PC's from your RNAseq data and check for correlation between these PC's and any of your clinical covariates

3) some people prefer to unbiasly normalize their RNAseq data using PEER (or similar) software to adjust for hidden covariates as well.

ADD COMMENT • link 3.8 years ago by Floris Brenk ★ 1.0k

0

Entering edit mode

Hello,

Thanks for your input. PEER seems like a popular choice for this sort of thing but I've had terrible trouble installing the package - I might try SVA instead. I like the idea of correlating the PCs to the clinical data - I'll try that as well.

Cheers.

ADD REPLY • link 3.8 years ago by Colari19 ▴ 90

0

Entering edit mode

I also struggled to install PEER, and ended up opting for sva, which is much simpler and well-documented.

ADD REPLY • link 3.7 years ago by rodd ▴ 230