Question

How to define low, medium or high expression of a microRNA?

0

Entering edit mode

2.5 years ago

js1234 ▴ 10

Hello everyone,

I am working with TCGA tumor samples, however the base from where i retrieved information regarding microRNA expression in those samples is the OncoMir Cancer Database (absolute data view)

Now i have the expression of the microRNA of my interest in the tumor samples selected (reads per million miRNA mapped), and i wanna stablish what is low, medium or high expression among those tumor samples.

How do i stablish the cut-off values that define what is low, medium or high expression of that microRNA in my tumor samples?

Thanks

microRNA statistics analysis TCGA expression • 1.1k views

ADD COMMENT • link updated 2.5 years ago by i.sudbery 19k • written 2.5 years ago by js1234 ▴ 10

0

Entering edit mode

I don't think there is a standard for this sort of thing. You will need to decide on those definitions. You could take a look at the distribution of values across samples and see if you can make an informed decision based on actual data.

ADD REPLY • link 2.5 years ago by GenoMax 141k

0

Entering edit mode

Thanks for the answer!

ADD REPLY • link 2.5 years ago by js1234 ▴ 10

0

Entering edit mode

Does anyone know other ways?

ADD REPLY • link 2.5 years ago by js1234 ▴ 10

1

Entering edit mode

Use tertiles, quartiles, or anything up to deciles.

ADD REPLY • link 2.5 years ago by Kevin Blighe 87k

score 1 · Answer 1 · 2021-11-07

Just to formailize what GenoMax and Kevin Blighe blighe said in their comments:

Categories like, low, medium and high are artificial threshold that have no meaning to nature: nature deal in continua not categories. However, that said, dividing things into categories can be useful. The key here though is that if we are dividing things for convenience sake, we should divide them into the categories that are most convient for the particular task at hand.

The way in which the levels on a miRNA impacts on the expression of its targets is a complex relationship dependent not only on levels of the miRNA, but also the sequence of the target, the transcription rate of the target, and probably the RNA binding protein context of the target in ways which we are only just beginning to understand. That is, for one miRNA-target pair, a miRNA at 10 cpm might have a very large effect on target expression levels, while for a different miRNA-targe pair, the same miRNA expression level might have little or no effect.

One can imagine any number of schemes for dividing miRNAs into categories based on expression. Here are three you might like to consider:

tertiles: if you have 1000 miRNAs, rank them, and then divide that ranking into three - the bottom 333, the middle 333 and the top 333.
Divide the range. Find the expression level of the most highly expressed miRNA. Base your thresholds on that - so if the most highly expressed miRNA has a CPM of 600, then you might divide into 0-200 CPM, 200-400 CPM and 400-600 CPM. Note that these are likely to be very different sized categories - doing this with log CPMs might be slightly better, but I'd still expect the bottom category to have far more miRNAs than the other ones.
As you are likely to find that the counts for many miRNAs is close to zero, you might define an "unexpressed" category (0 or 1 read), a "top expressed category" (top 10% of miRNAs) and a thrid category that contains everything else.

There is no telling ahead of time which of these is best, and it will probably depend on what point you are trying to make, or what hypothesis you are trying to test. I'd probably give all of them a go and see which set of results made the most sense.