Question: Estimate gene expression data
0
gravatar for banerjeeshayantan
4 weeks ago by
banerjeeshayantan110 wrote:

I have a set of patients who has suffered heart attacks and I have the clinical information already available. Now I want to extract/estimate the gene expression data for the same group of patients. Since I don't have the data for those exact patients, is there any way I can sort of average the expression data of other patients suffering from the same disease and the same age group and estimate the expression data? What methods other than averaging can be used to estimate gene expression for unknown samples? I also came across simulation of gene expression data. Can someone also elaborate on that?

gene • 118 views
ADD COMMENTlink written 4 weeks ago by banerjeeshayantan110

I wonder how much background reading you have actually performed? You posted a virtually identical question yesterday: A: Integrating EHR data with genomic data

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe41k

Hi Kevin. I am still in the process of going through recently published materials on the topic (which is quite overwhelming, to say the least :P). I have gone through R packages to both estimate and simulate gene expression data, but I am kind of lost along the way. I feel that there is a significant missing link between integrating these two types of data and it is very problem specific. Hence I thought whether averaging gene expression values (while making a lot of assumptions) at all makes sense. Would you be willing to point out to some research articles where I can get some intuition?

ADD REPLYlink written 4 weeks ago by banerjeeshayantan110
2

You would have to construct a regression model with some training data. With a model constructed, you could then predict gene expression level in other datasets. I can write this in 1 sentence, but the coding is laborious to do it;

MyModel <- lm(MyGene ~ gene1 + gene2 + ClinicalParam1 + ClinicalParam2, data = TrainingData)
predict(MyModel, TestingData)

Here, the combined values of gene1, gene2, ClinicalParam1, and ClinicalParam2 are predictive of the expression of MyGene.

Doing things like this without any training is not easy. One can easily build a simple linear regression model, but not many people know how to properly test it. You may require a course in order to get proper knowledge on this - it branches more into biostatistics than bioinformatics.

Lecture 3 in my R notes, which I devised with a statistician, may help you a bit: https://github.com/kevinblighe/Rtutorials

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Kevin Blighe41k

Hi Kevin, I went through your notes and basics of regression analysis. I have a few questions. Firstly, in the above example, you assumed that I have ClinicalParam1, ClinicalParam2, gene1, gene2 data for many patients. That means the clinical data has already been integrated with gene expression data. But this is not the case in my problem. I have two separate datasets (clinical and gene expression) with different samples altogether. So I have no mapping between the two. How can I apply regression if I can't combine the two somehow? In such cases, is there a way to find the missing link?

ADD REPLYlink written 29 days ago by banerjeeshayantan110

lol - if you do not have matched samples between your clinical and expression data, then I cannot really help you any further...You may simply analyse them separately and then report the best clinical variables and the best genes, and then state that a further study is required.

ADD REPLYlink written 29 days ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 936 users visited in the last hour