Question: Mirna Differential Expression
4
gravatar for Ashutosh Pandey
6.1 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

Hello,

I have miRNA short read data for few samples. I want to do differential analysis to find out if the samples show differences in the expression level of any miRNA. I know there are few tools available for this but still I carried out most of the steps manually. This is what i did:

1) Selected reads that range between 18-32 bp and aligned them against reference genome and only kept the ones that got aligned.

2) Filtered reads that got aligned to database of small RNA other than miRNA.

3) Aligned them against the miRBase using SHRiMP2 using the miRNA mode. I used precursor microRNA database. I am not sure why I did this but somewhere I read that is advisable to align short reads against precursor miRNA rather than mature miRNA. Please free to comment about this step. I may not be right.

Now, I have the SAM files which i can use to quantify the expression of different miRNAs. But as different samples have different number of starting reads I need to normalize the counts. I can use a modified RPKM value where I can ignore the length factor and only use total aligned reads for a sample. My first question is should I use a) total number of reads aligned for a sample including non-miRNA short RNAs, miRNAs to normalize. b) OR total number of reads aligned to miRNA database to normalize the counts for a miRNA.

Also, in case if you have a better idea for differential expression analysis I would appreciate it.

differential mirna • 7.5k views
ADD COMMENTlink modified 6.0 years ago by Michele Busby2.0k • written 6.1 years ago by Ashutosh Pandey11k
6
gravatar for Michele Busby
6.0 years ago by
Michele Busby2.0k
United States
Michele Busby2.0k wrote:

Hi Ashutoshmits,

I would be cautious about using the total reads as your normalization metric. This can be problematic if you have a miRNA in one condition that is a highly expressed outlier. That is, it can skew where you put your normalization line a little and make everything look a bit wrong. This is when people start reporting odd findings like 90% of DE miRNA were down regulated. And then you don't know if it was real or caused by a sketchy normalization.

We looked at it a little in our paper here: http://www.biomedcentral.com/1471-2164/12/635 in the supplement but other people have looked at in more depth than us. We ended up comparing some different methods and used a scaling method based on the dividing the median value of the two samples. DESeq uses the same method. It seems fairly robust, but I have seen this fail with real samples if the library complexity differs.

What I would do if you are using a scaling factor is plot your two samples you are normalizing on an X-Y scatter plot. Then draw a line showing the slope of your scaling factor through the middle of the points. Then by eyeballing it you can tell if it is reasonable or not.

I don't think this is a question that is completely solved. Some people like quantile normalization. I don't like it so much but informed people can disagree. Simply statistics had a good blog post on this: http://simplystatistics.org/2013/04/26/mindlessly-normalizing-genomics-data-is-bad-but-ignoring-unwanted-variability-can-be-worse/

It may be that different methods work better for different types of samples. But if you draw that line it should get you 80% of the way there. Sometimes totals reads is just fine.

ADD COMMENTlink written 6.0 years ago by Michele Busby2.0k
1
gravatar for camelbbs
6.1 years ago by
camelbbs650
China
camelbbs650 wrote:

Actually, I have a similar questions within refseq gene: using total reads in in refseq genes to calculate the rpkm

I think for miRNA, it is the similar. Since there are less overlapped miRNA, it doesn't matter whether you use the total reads of all noncoding RNA or the total reads of miRNA, to calculate the RPKM. But just keep consistent in all your analysis.

Also look forward to other people's suggestion.

ADD COMMENTlink written 6.1 years ago by camelbbs650

Yup I read your Refseq question. Ultimately, I ended up using total reads that mapped to miRNA.

ADD REPLYlink written 6.1 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1475 users visited in the last hour