Question

Estimate gene expression data

0

Entering edit mode

5.1 years ago

Gene_MMP8 ▴ 240

I have a set of patients who has suffered heart attacks and I have the clinical information already available. Now I want to extract/estimate the gene expression data for the same group of patients. Since I don't have the data for those exact patients, is there any way I can sort of average the expression data of other patients suffering from the same disease and the same age group and estimate the expression data? What methods other than averaging can be used to estimate gene expression for unknown samples? I also came across simulation of gene expression data. Can someone also elaborate on that?

gene • 1.2k views

ADD COMMENT • link updated 4.4 years ago by Biostar 20 • written 5.1 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

I wonder how much background reading you have actually performed? You posted a virtually identical question yesterday: A: Integrating EHR data with genomic data

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin. I am still in the process of going through recently published materials on the topic (which is quite overwhelming, to say the least :P). I have gone through R packages to both estimate and simulate gene expression data, but I am kind of lost along the way. I feel that there is a significant missing link between integrating these two types of data and it is very problem specific. Hence I thought whether averaging gene expression values (while making a lot of assumptions) at all makes sense. Would you be willing to point out to some research articles where I can get some intuition?

ADD REPLY • link 5.1 years ago by Gene_MMP8 ▴ 240

3

Entering edit mode

You would have to construct a regression model with some training data. With a model constructed, you could then predict gene expression level in other datasets. I can write this in 1 sentence, but the coding is laborious to do it;

MyModel <- lm(MyGene ~ gene1 + gene2 + ClinicalParam1 + ClinicalParam2, data = TrainingData)
predict(MyModel, TestingData)

Here, the combined values of gene1, gene2, ClinicalParam1, and ClinicalParam2 are predictive of the expression of MyGene.

Doing things like this without any training is not easy. One can easily build a simple linear regression model, but not many people know how to properly test it. You may require a course in order to get proper knowledge on this - it branches more into biostatistics than bioinformatics.

Lecture 3 in my R notes, which I devised with a statistician, may help you a bit: https://github.com/kevinblighe/Rtutorials

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, I went through your notes and basics of regression analysis. I have a few questions. Firstly, in the above example, you assumed that I have ClinicalParam1, ClinicalParam2, gene1, gene2 data for many patients. That means the clinical data has already been integrated with gene expression data. But this is not the case in my problem. I have two separate datasets (clinical and gene expression) with different samples altogether. So I have no mapping between the two. How can I apply regression if I can't combine the two somehow? In such cases, is there a way to find the missing link?

ADD REPLY • link 5.1 years ago by Gene_MMP8 ▴ 240

0

Entering edit mode

lol - if you do not have matched samples between your clinical and expression data, then I cannot really help you any further...You may simply analyse them separately and then report the best clinical variables and the best genes, and then state that a further study is required.

ADD REPLY • link 5.1 years ago by Kevin Blighe 87k