Question

A Bit Unusual Way Of Analyzing The Data

2

Entering edit mode

13.0 years ago

Assa ▴ 20

Hello everyone,

I have a question today which doesn't necessarily has anything to do directly with R, but I was hoping to get some answers here.

I was approached by one of our biologists which a somehow unusual problem.

They are doing a microarray analysis of miRNA with Drosophila and have several body parts as well as whole fly arrays. Now they don't just want to see what are the differences between the different body parts, but also in general would like to know what happens in each of the body parts on themselves. What she asked me was, whether is it possible just to see in each array group (without comparing it to the other groups) what miRNA are expressed.

The way I was thinking about it was, to order the miRNA according to their expression values and just to try t find a cut-off, where I can divide the "real" expression from the background noise.

The way of finding this allegeable threshold will be like doing a BLAST search with the rest of the miRNA (which are not from Drosophila on this array) and look for the highest non-Drosophila hit in the BLAST which is also in the List of this expression values. Than look for the first one with at least 4 mismatches ( this is an arbitrarily choice and can be refined if needed). All the hits bellow this one will be considered as noise, The rest can be look at as "really" expressed.

So what i really want to know is, whether this kind of approach is in any sense reasonable at all and if there is any other way of doing the same analysis

thanks for any suggestions and help

Assa

statistics microarray r • 2.2k views

ADD COMMENT • link updated 10.3 years ago by Biostar 20 • written 13.0 years ago by Assa ▴ 20

score 4 · Answer 1 · 2011-04-20

A similar issue of showing which microRNAs are expressed (and which are too close to undetectable levels) was addressed in a recent paper on microRNAs in HDL particles. See Vickers, Remaley, et al. (2011) MicroRNAs are transported in plasma and delivered to recipient cells by high-density lipoproteins. Nat Cell Biol. 13:423. Look carefully to see how they made the distinction between expressed and not expressed.

score 2 · Answer 2 · 2011-04-20

Hello,

globally the method seems fine but there is several point you should be cautious about.

Determining your most highly expressed non-drosophila miRNA won't indicate that this value of expression is an actual thresold for expressed vs non expressed miRNA. One property of a noisy background is to be... noisy. Genes below this value might be expressed while genes above this value might not.

This problem is specific to microarrays and can not be fully solved. If you use your list of miRNA ranked by expression you can considered it as your list of expressed genes associated with a level of confidence. The closest the rank is to 1, the more certain you can be it is actually expressed.

A way to have more certitude would be to use some cross references. Check in some databases such as mirBase or in the litterature of a given miRNA has been described as being expressed in a given tissue. This can validate an approach or a thresold.

But I think the main thing to keep in mind is that it might be, to my opinion, not possible to really extract a correct list of expressed miRNA based only on microarrays. Either you choose your threshold on a non-noisy region of your expression values distribution but will be very stringent (lot of expressed miRNA will be missing) or you put this theshold on a noisy region and might end up with a lot of false positive...

score 1 · Answer 3 · 2011-04-20

I think it looks like a "mixture" of distributions problem. You have a set of expressed miRNAs and a set of non expressed miRNA who are generating noise (there is also some "noise" in the expressed miRNAs as their observed level does not actually show the same).

As I have a limited knowledge of the distribution of these expressions right now, I was wondering whether a first graph is would not be helpful.

If the distribution looks normal, you might come up with a probability of being expressed or noise.

True that at the end of the day, you'll end up with a threshold of a probability that you choose.

If you have a list of validated miRNAs, as Philippe says, you would be able to infer the distribution of "truly" expressed miRNAs (true posistives) and will be able to apply this with even more confidence.