Microarray Data Without Replicates. Any Hope To Get Something Out Of It?
8
9
Entering edit mode
12.5 years ago
Pfs ▴ 580

Sorry if this is perhaps a dumb question.

I have some microarray data without biological replicates (just one data point for each condition). The data has been normalized to the control condition. Without any p-value, can anything be said about the difference in expression of a specific gene across the various conditions? Is the ratio sufficient? What cutoffs are normally used?

Thanks in advance!

microarray replicates • 10k views
ADD COMMENT
1
Entering edit mode

I opt to add this question to a list of FAQ, frequently asked on BioStar. We do not have this section yet, but it might be worth considering.

ADD REPLY
5
Entering edit mode
12.5 years ago

You can always interpret data that you have - because even if you measure a value once you now have some information.

The difficult part is that most statistical techniques that you are likely to encounter will not be directly applicable, nor would be the predictive power or estimates be valid even if these tools ran (sometimes they will run). You will need to rethink and reformulate what you can and cannot state based on your data.

For example once you have a variability of the gene expression levels across all spots you can compute the likelihood that a systematic pattern of say all genes being at an given extreme occur by chance on a given time series.

And so on. You will need to substantially scale back of what you can state, validate intermediate results against other knowledge. As long as you only ask questions that the data can support you can use it.

ADD COMMENT
0
Entering edit mode

Istvan, can you expand more on your 3rd paragraph? I don't see what questions one could ask, other than something like: in these 2 samples, was this gene expressed more than the other?

ADD REPLY
0
Entering edit mode

if you have a population of values around a mean (mean expression level) and you have a sample (a group of values, say a timeseries for a gene) you can compute the likelihood that the sample samples come from the population. Of course statistical power will be substantially lower than if one had replicates. But there are some questions one can answer: say gene group 1 was observed to bemore highly expressed than gene group 2 and we can assign a p-value to that.

ADD REPLY
0
Entering edit mode

the important thing is that we cannot ask the same type of questions that we could if we had replicates - but it is still a measurement that has some information content

ADD REPLY
0
Entering edit mode

So you're considering having a time-series as distinct from having replicates?

ADD REPLY
0
Entering edit mode

what I mean is that if there is a way to group measurements by some attributes - replicates, time series, ontology terms etc - then we have a way to assign a likelihood to an observation on that attribute.

ADD REPLY
0
Entering edit mode

Makes sense, thanks for the explanation.

ADD REPLY
4
Entering edit mode
12.5 years ago

Of course you can tell lots of stories about why it is not a good idea. But in reality it might not be so bad if you just want to get an impression about what happens.

The thing is you probably could get some idea about the biological variation for gene expression in the tissue you study and on the platform you use from the gene expression repositories GEO and Arrayexpress. Of course your own technical variation might be different from average (especially if this also was your first array), but it would still give you some impression.

Also part of your understanding from an array study comes from what genes are changed in groups that belong to the same pathway or biological process. You could still evaluate that even for just one array.

ADD COMMENT
4
Entering edit mode
12.5 years ago
Neilfws 49k

A biologist will tell you that all experiments contain useful information. A statistician will tell you that only experiments that are amenable to statistical testing contain useful information.

There are plenty of experiments in the GEO database which compare only 2 arrays - some of them have even been published in high-profile journals. Which is not to say that this is good practice.

I'm a "statistical bioinformatician", working in a division with professional statisticians and we would tell you that you can do very little (if any) useful analysis without replicates.

ADD COMMENT
1
Entering edit mode
12.5 years ago
brentp 24k

Simon Anders has some good posts on this on seqanswers, namely this one: http://seqanswers.com/forums/showpost.php?p=39704&postcount=2

Although it's targeted at RNA-Seq, I think his points about replicates extend to most studies. Since you don't know the natural variation, you can not attribute the variation you are seeing to the "condition" as it may be variation that is normal to the system. The variation be different for microarrays, but the problem is the same.

That said, DESeq has a method to find differentially expressed genes without replicates using the variation in the samples provided... You could look at that to see what to try.

ADD COMMENT
0
Entering edit mode

DESeq only works for count-data, as in reads per region, and will not apply to array intensity data.

ADD REPLY
0
Entering edit mode

True, I figured that the methods for dealing with data without replicates may be informative.

ADD REPLY
1
Entering edit mode
12.5 years ago

Good question and good replies - so not much to add here. One thing I will offer is to consider looking at changes in gene sets, or gene set enrichment analysis. GSEA is an example. You do have issues, as mentioned by others, with regard to variation in signal and statistics. Just as a gene may have natural variation in expression, so too can a set of genes that share some function or mode of control. Nonetheless, it may be beneficial to look at something like GSEA.

Advice: Don't try to drill down to too much detail. Provide an overview - GO terms, pathways, gene sets - and stay within the realm where your confidence in the results is high.

In the end, the results do allow one to claim that testable hypotheses can be formulated and prioritized for later work.

ADD COMMENT
1
Entering edit mode
12.5 years ago
Michael 54k

In addition to what has been said here already, remember the folowing:

Nobody said that all biological replicates must be performed at the same time. In fact, you can add the biological replicates now. If your lab invests just little bit more in replication, repeating the experiment and adding two more biological replicates, your data will become so much more valuable.

The added value is so relevant in this case, from almost unusable and not publishable (anymore), to a analysis allowing for assessing significance!

Think about it: instead of helping the experimenter to salvage their crappy experimental design (and that's what I would seriously call it), resist the temptation and ask them to improve their experiment instead. You will avoid wasting a lot of time, yours and the experimenters (especially, unfortunately many don't understand the statistical implications).

Good luck!

ADD COMMENT
1
Entering edit mode

And don't forget to correct for batch effects if you have samples run at different times!

ADD REPLY
0
Entering edit mode

Btw, I don't think there is anything indicating that doing samples at different days increases the likelihood of batch effects, is it? Even if it did, such effects might indicate an increased biological truth, or to say it differently, there are always batch effects, but as long as you stick with one batch (have everything conducted by the same person, at the same lab, same batch of arrays) you don't see them.

ADD REPLY
0
Entering edit mode

Always look for batch effects!

ADD REPLY
0
Entering edit mode
12.5 years ago

Just to add, no one has mentioned a cutoff... How about 2-fold?

Of course, you might just rank your list of genes by log2(R/G) and take the top X genes where X is dependent upon how much follow up you plan to do.

ADD COMMENT
1
Entering edit mode

But if it's a choice between throwing the data out completely or looking at the data... You should still be able to learn something.

Maybe filter out genes for probes with low intensities.

ADD REPLY
0
Entering edit mode

No, 2-fold or even 10-fold change in expression could arise from normal variance in the signal, especially for signals of low value/strength. On the other hand, with good replicates a 1.1-fold change may be accurate/significant and reliable to report.

ADD REPLY
0
Entering edit mode

Completely agree with @Larry_Parnell, fold-change alone is not enough. Otherwise this entire conversation, and statistics in general, would be useless: you wold simply use a threshold for ratios and declare significance.

ADD REPLY
0
Entering edit mode
12.5 years ago
Houkto ▴ 220

Did you try to use RankProduct R Breitlinga - 2004 it has a good review by Jeffery 2006. It very good with experiment with few replica. Give it ago. Try to use it with R if that is hard then try the onechannelGUI (graphic interface in R) if that is too much then look at the RankProduct web interactive tool of RankProd URL . Also PCA of the experiment can tell you something about the data.

ADD COMMENT

Login before adding your answer.

Traffic: 2971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6