Question

Mirna Differential Expression

6

Entering edit mode

11.0 years ago

Ashutosh Pandey 12k

Hello,

I have miRNA short read data for few samples. I want to do differential analysis to find out if the samples show differences in the expression level of any miRNA. I know there are few tools available for this but still I carried out most of the steps manually. This is what i did:

Selected reads that range between 18-32 bp and aligned them against reference genome and only kept the ones that got aligned.
Filtered reads that got aligned to database of small RNA other than miRNA.
Aligned them against the miRBase using SHRiMP2 using the miRNA mode. I used precursor microRNA database. I am not sure why I did this but somewhere I read that is advisable to align short reads against precursor miRNA rather than mature miRNA. Please free to comment about this step. I may not be right.

Now, I have the SAM files which i can use to quantify the expression of different miRNAs. But as different samples have different number of starting reads I need to normalize the counts. I can use a modified RPKM value where I can ignore the length factor and only use total aligned reads for a sample. My first question is should I use a) total number of reads aligned for a sample including non-miRNA short RNAs, miRNAs to normalize. b) OR total number of reads aligned to miRNA database to normalize the counts for a miRNA.

Also, in case if you have a better idea for differential expression analysis I would appreciate it.

mirna differential • 9.3k views

ADD COMMENT • link updated 3.0 years ago by Ram 43k • written 11.0 years ago by Ashutosh Pandey 12k

Ram · Answer 1 · 2013-05-09

Hi Ashutoshmits,

I would be cautious about using the total reads as your normalization metric. This can be problematic if you have a miRNA in one condition that is a highly expressed outlier. That is, it can skew where you put your normalization line a little and make everything look a bit wrong. This is when people start reporting odd findings like 90% of DE miRNA were down regulated. And then you don't know if it was real or caused by a sketchy normalization.

We looked at it a little in our paper here in the supplement but other people have looked at in more depth than us. We ended up comparing some different methods and used a scaling method based on the dividing the median value of the two samples. DESeq uses the same method. It seems fairly robust, but I have seen this fail with real samples if the library complexity differs.

What I would do if you are using a scaling factor is plot your two samples you are normalizing on an X-Y scatter plot. Then draw a line showing the slope of your scaling factor through the middle of the points. Then by eyeballing it you can tell if it is reasonable or not.

I don't think this is a question that is completely solved. Some people like quantile normalization. I don't like it so much but informed people can disagree. Simply statistics had a good blog post on this.

It may be that different methods work better for different types of samples. But if you draw that line it should get you 80% of the way there. Sometimes totals reads is just fine.

Ram · Answer 2 · 2013-05-07

1

Entering edit mode

11.0 years ago

camelbbs ▴ 710

Actually, I have a similar questions within refseq gene: Using Total Reads In In Refseq Genes To Calculate The Rpkm

I think for miRNA, it is the similar. Since there are less overlapped miRNA, it doesn't matter whether you use the total reads of all noncoding RNA or the total reads of miRNA, to calculate the RPKM. But just keep consistent in all your analysis.

Also look forward to other people's suggestion.

ADD COMMENT • link updated 3.0 years ago by Ram 43k • written 11.0 years ago by camelbbs ▴ 710

0

Entering edit mode

Yup I read your Refseq question. Ultimately, I ended up using total reads that mapped to miRNA.

ADD REPLY • link updated 3.0 years ago by Ram 43k • written 11.0 years ago by Ashutosh Pandey 12k