Question: Statistical Test To Follow For Gene-Exp Analysis
gravatar for ssakhare
7.0 years ago by
ssakhare0 wrote:

I have 9 small gene expression data sets of tumor and normal. Some of the tumor samples have its matched normal, they are coming from same patient. Whereas some tumor samples those are from different patients than above don't have matched normal, thus they would be independent samples Thus some samples are dependent and some are independent. All the samples follow normal distribution. These samples are distributed in different groups that I have to analyze separately. Since the sample sizes to be compared are varying largely across groups I was curious if I can apply t-test for this kind of data. (All data are normalized gene expression) Sample data:

            No. of normal samples    No. of tumor samples
 group 1         12                        57

(Here 12 normal samples have 12 matched tumor samples coming from same patients whereas remaining 45 samples are
coming from different patients and don't have any normal)

 group 2         02                                  33

 group 3         11                                 106



 group 9          2                                    12

I tried looking up for solution but it is really confusing as what statistical test/method to use for such analyses. I would like to know how can I analyze such data group wise to get significant genes?

Thank you!

data analysis statistics • 2.2k views
ADD COMMENTlink modified 7.0 years ago by lkmklsmn930 • written 7.0 years ago by ssakhare0

Hi. I am still a bit unclear about your samples. So you have 9 datasets, each with varying amount of control and tumor samples. In each of the 9 datasets, only subsets of the tumor samples have a matching control and the rest do not?

Are all of these samples from different patients? Are there any biological replicates? What do you mean by significant genes? Differentially expressed genes?

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by Damian Kao15k

Thank you for your reply Damian.

Yes, in each of the 9 datasets only subsets of the tumor have a matching control and rest do not.

In each dataset the patients those have matched controls are from same patients but the rest of the samples that don't have controls are from different patients.

There are no biological replicates.

With significant genes, I mean differentially expressed genes.

Sorry for not being more clear.

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by ssakhare0
gravatar for lkmklsmn
7.0 years ago by
United States
lkmklsmn930 wrote:

Sounds like you could use a linear model and run a regression.

This way you could account for different effects such as sample or dataset.

The model could like this:

Y~person+gene expression

,where Y is a vector containing either 'tumor' or 'control'.Person is a vector containing some of the information you have about each sample.And gene expression is a vector containing the expression values across the samples.

This analysis is easily implemented in R using the aov() function.

ADD COMMENTlink written 7.0 years ago by lkmklsmn930
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 934 users visited in the last hour