Question: Microarray Data Without Replicates. Any Hope To Get Something Out Of It?
9
gravatar for Pfs
7.1 years ago by
Pfs490
United States
Pfs490 wrote:

Sorry if this is perhaps a dumb question.

I have some microarray data without biological replicates (just one data point for each condition). The data has been normalized to the control condition. Without any p-value, can anything be said about the difference in expression of a specific gene across the various conditions? Is the ratio sufficient? What cutoffs are normally used?

Thanks in advance!

replicates microarray • 6.7k views
ADD COMMENTlink written 7.1 years ago by Pfs490
1

I opt to add this question to a list of FAQ, frequently asked on BioStar. We do not have this section yet, but it might be worth considering.

ADD REPLYlink written 5.5 years ago by Michael Dondrup45k
5
gravatar for Istvan Albert
7.1 years ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

You can always interpret data that you have - because even if you measure a value once you now have some information.

The difficult part is that most statistical techniques that you are likely to encounter will not be directly applicable, nor would be the predictive power or estimates be valid even if these tools ran (sometimes they will run). You will need to rethink and reformulate what you can and cannot state based on your data.

For example once you have a variability of the gene expression levels across all spots you can compute the likelihood that a systematic pattern of say all genes being at an given extreme occur by chance on a given time series.

And so on. You will need to substantially scale back of what you can state, validate intermediate results against other knowledge. As long as you only ask questions that the data can support you can use it.

ADD COMMENTlink written 7.1 years ago by Istvan Albert ♦♦ 78k

Istvan, can you expand more on your 3rd paragraph? I don't see what questions one could ask, other than something like: in these 2 samples, was this gene expressed more than the other?

ADD REPLYlink written 7.1 years ago by brentp22k

if you have a population of values around a mean (mean expression level) and you have a sample (a group of values, say a timeseries for a gene) you can compute the likelihood that the sample samples come from the population. Of course statistical power will be substantially lower than if one had replicates. But there are some questions one can answer: say gene group 1 was observed to bemore highly expressed than gene group 2 and we can assign a p-value to that.

ADD REPLYlink written 7.1 years ago by Istvan Albert ♦♦ 78k

the important thing is that we cannot ask the same type of questions that we could if we had replicates - but it is still a measurement that has some information content

ADD REPLYlink written 7.1 years ago by Istvan Albert ♦♦ 78k

So you're considering having a time-series as distinct from having replicates?

ADD REPLYlink written 7.1 years ago by brentp22k

what I mean is that if there is a way to group measurements by some attributes - replicates, time series, ontology terms etc - then we have a way to assign a likelihood to an observation on that attribute.

ADD REPLYlink written 7.1 years ago by Istvan Albert ♦♦ 78k

Makes sense, thanks for the explanation.

ADD REPLYlink written 7.1 years ago by brentp22k
4
gravatar for Chris Evelo
7.1 years ago by
Chris Evelo9.9k
Maastricht, The Netherlands
Chris Evelo9.9k wrote:

Of course you can tell lots of stories about why it is not a good idea. But in reality it might not be so bad if you just want to get an impression about what happens.

The thing is you probably could get some idea about the biological variation for gene expression in the tissue you study and on the platform you use from the gene expression repositories GEO and Arrayexpress. Of course your own technical variation might be different from average (especially if this also was your first array), but it would still give you some impression.

Also part of your understanding from an array study comes from what genes are changed in groups that belong to the same pathway or biological process. You could still evaluate that even for just one array.

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Chris Evelo9.9k
4
gravatar for Neilfws
7.1 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

A biologist will tell you that all experiments contain useful information. A statistician will tell you that only experiments that are amenable to statistical testing contain useful information.

There are plenty of experiments in the GEO database which compare only 2 arrays - some of them have even been published in high-profile journals. Which is not to say that this is good practice.

I'm a "statistical bioinformatician", working in a division with professional statisticians and we would tell you that you can do very little (if any) useful analysis without replicates.

ADD COMMENTlink written 7.1 years ago by Neilfws48k
1
gravatar for brentp
7.1 years ago by
brentp22k
Salt Lake City, UT
brentp22k wrote:

Simon Anders has some good posts on this on seqanswers, namely this one: http://seqanswers.com/forums/showpost.php?p=39704&postcount=2

Although it's targeted at RNA-Seq, I think his points about replicates extend to most studies. Since you don't know the natural variation, you can not attribute the variation you are seeing to the "condition" as it may be variation that is normal to the system. The variation be different for microarrays, but the problem is the same.

That said, DESeq has a method to find differentially expressed genes without replicates using the variation in the samples provided... You could look at that to see what to try.

ADD COMMENTlink written 7.1 years ago by brentp22k

DESeq only works for count-data, as in reads per region, and will not apply to array intensity data.

ADD REPLYlink written 7.1 years ago by Karl310

True, I figured that the methods for dealing with data without replicates may be informative.

ADD REPLYlink written 7.1 years ago by brentp22k
1
gravatar for Larry_Parnell
7.1 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

Good question and good replies - so not much to add here. One thing I will offer is to consider looking at changes in gene sets, or gene set enrichment analysis. GSEA is an example. You do have issues, as mentioned by others, with regard to variation in signal and statistics. Just as a gene may have natural variation in expression, so too can a set of genes that share some function or mode of control. Nonetheless, it may be beneficial to look at something like GSEA.

Advice: Don't try to drill down to too much detail. Provide an overview - GO terms, pathways, gene sets - and stay within the realm where your confidence in the results is high.

In the end, the results do allow one to claim that testable hypotheses can be formulated and prioritized for later work.

ADD COMMENTlink written 7.1 years ago by Larry_Parnell16k
1
gravatar for Michael Dondrup
7.1 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

In addition to what has been said here already, remember the folowing:

Nobody said that all biological replicates must be performed at the same time. In fact, you can add the biological replicates now. If your lab invests just little bit more in replication, repeating the experiment and adding two more biological replicates, your data will become so much more valuable.

The added value is so relevant in this case, from almost unusable and not publishable (anymore), to a analysis allowing for assessing significance!

Think about it: instead of helping the experimenter to salvage their crappy experimental design (and that's what I would seriously call it), resist the temptation and ask them to improve their experiment instead. You will avoid wasting a lot of time, yours and the experimenters (especially, unfortunately many don't understand the statistical implications).

Good luck!

ADD COMMENTlink written 7.1 years ago by Michael Dondrup45k
1

And don't forget to correct for batch effects if you have samples run at different times!

ADD REPLYlink written 7.1 years ago by Daniel Swan13k

Btw, I don't think there is anything indicating that doing samples at different days increases the likelihood of batch effects, is it? Even if it did, such effects might indicate an increased biological truth, or to say it differently, there are always batch effects, but as long as you stick with one batch (have everything conducted by the same person, at the same lab, same batch of arrays) you don't see them.

ADD REPLYlink written 5.5 years ago by Michael Dondrup45k

Always look for batch effects!

ADD REPLYlink written 7.1 years ago by Michael Dondrup45k
0
gravatar for Madelaine Gogol
7.1 years ago by
Madelaine Gogol5.0k
Kansas City
Madelaine Gogol5.0k wrote:

Just to add, no one has mentioned a cutoff... How about 2-fold?

Of course, you might just rank your list of genes by log2(R/G) and take the top X genes where X is dependent upon how much follow up you plan to do.

ADD COMMENTlink written 7.1 years ago by Madelaine Gogol5.0k
1

But if it's a choice between throwing the data out completely or looking at the data... You should still be able to learn something.

Maybe filter out genes for probes with low intensities.

ADD REPLYlink written 7.1 years ago by Madelaine Gogol5.0k

No, 2-fold or even 10-fold change in expression could arise from normal variance in the signal, especially for signals of low value/strength. On the other hand, with good replicates a 1.1-fold change may be accurate/significant and reliable to report.

ADD REPLYlink written 7.1 years ago by Larry_Parnell16k

Completely agree with @Larry_Parnell, fold-change alone is not enough. Otherwise this entire conversation, and statistics in general, would be useless: you wold simply use a threshold for ratios and declare significance.

ADD REPLYlink written 11 months ago by sargdavid0
0
gravatar for Houkto
7.1 years ago by
Houkto210
Houkto210 wrote:

Did you try to use RankProduct R Breitlinga - 2004 it has a good review by Jeffery 2006. It very good with experiment with few replica. Give it ago. Try to use it with R if that is hard then try the onechannelGUI (graphic interface in R) if that is too much then look at the RankProduct web interactive tool of RankProd URL . Also PCA of the experiment can tell you something about the data.

ADD COMMENTlink written 7.1 years ago by Houkto210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1120 users visited in the last hour