Question

Statistical Test To Follow For Gene-Exp Analysis

0

Entering edit mode

10.9 years ago

ssakhare • 0

I have 9 small gene expression data sets of tumor and normal. Some of the tumor samples have its matched normal, they are coming from same patient. Whereas some tumor samples those are from different patients than above don't have matched normal, thus they would be independent samples Thus some samples are dependent and some are independent. All the samples follow normal distribution. These samples are distributed in different groups that I have to analyze separately. Since the sample sizes to be compared are varying largely across groups I was curious if I can apply t-test for this kind of data. (All data are normalized gene expression) Sample data:

            No. of normal samples    No. of tumor samples
 group 1         12                        57

(Here 12 normal samples have 12 matched tumor samples coming from same patients whereas remaining 45 samples are
coming from different patients and don't have any normal)

 group 2         02                                  33

 group 3         11                                 106

  ..
  ..

  ..

 group 9          2                                    12

I tried looking up for solution but it is really confusing as what statistical test/method to use for such analyses. I would like to know how can I analyze such data group wise to get significant genes?

Thank you!

data analysis statistics • 2.8k views

ADD COMMENT • link updated 10.9 years ago by lkmklsmn ▴ 970 • written 10.9 years ago by ssakhare • 0

0

Entering edit mode

Hi. I am still a bit unclear about your samples. So you have 9 datasets, each with varying amount of control and tumor samples. In each of the 9 datasets, only subsets of the tumor samples have a matching control and the rest do not?

Are all of these samples from different patients? Are there any biological replicates? What do you mean by significant genes? Differentially expressed genes?

ADD REPLY • link 10.9 years ago by Damian Kao 16k

0

Entering edit mode

Thank you for your reply Damian.

Yes, in each of the 9 datasets only subsets of the tumor have a matching control and rest do not.

In each dataset the patients those have matched controls are from same patients but the rest of the samples that don't have controls are from different patients.

There are no biological replicates.

With significant genes, I mean differentially expressed genes.

Sorry for not being more clear.

ADD REPLY • link 10.9 years ago by ssakhare • 0

score 0 · Answer 1 · 2013-06-26

Sounds like you could use a linear model and run a regression.

This way you could account for different effects such as sample or dataset.

The model could like this:

Y~person+gene expression

,where Y is a vector containing either 'tumor' or 'control'.Person is a vector containing some of the information you have about each sample.And gene expression is a vector containing the expression values across the samples.

This analysis is easily implemented in R using the aov() function.