Question

Misleading mirDeep2 output? recall not tallying

2

Entering edit mode

7.5 years ago

Michael 54k

We are further trying to understand the output of the mirDeep2 pipeline. We might need more help in understanding this output, but we have now arrived at the following hypothesis: The built-in survey in miRdeep2 over-reports on the recall rate. To illustrate, what we mean, I have put the complete output of miRdeep2 online here: http://www.ii.uib.no/~mdo041/mirDeep2_output_untrimmed/

Observation of possible discrepancy in the survey output

Have a look at the following excerpt from the score table:

10 12 8 ± 3 4 ± 3 (34 ± 22%) 438 292 205 (70%) 13.7 3

...

0 26 21 +/- 5 6 +/- 4 (22 +/- 17%) 438 292 227 (78%) 6 3

The highlighted numbers are: score cutoff, number of novel miRNA, number of known miRBase miRNAs detected by miRDeep2. mirdeep_runs/run_31_10_2016_t_16_31_45/survey.csv | Or see the top table of result_31_10_2016_t_16_31_45.html

Indeed we receive 26 novel miRNAs at a score cutoff of 0. The trouble starts when we are trying to find the promised 200+ known miRNAs in the output. Instead of 227, we get only 146! We have looked through several files, but the output is consistently lower than even the lowest numbers reported by the survey (205). Also, there we never get more than 146.

So, where are the missing detected known miRNAs? If mirDeep is able to detect them, why does it not output them to any file, or have we overlooked them in the large number of output files?

    grep -e'known:' result_31_10_2016_t_16_31_45.bed | wc -l
146
    grep -e'novel:' result_31_10_2016_t_16_31_45.bed |wc -l
26 ## this seems to be correct at score cutoff 0

mirna_results_31_10_2016_t_16_31_45 $ grep -ce'>' *.fa
known_mature_31_10_2016_t_16_31_45_score-50_to_na.fa:146
known_pres_31_10_2016_t_16_31_45_score-50_to_na.fa:146
known_star_31_10_2016_t_16_31_45_score-50_to_na.fa:146
not_mature_31_10_2016_t_16_31_45_score-50_to_na.fa:0
not_pres_31_10_2016_t_16_31_45_score-50_to_na.fa:0
not_star_31_10_2016_t_16_31_45_score-50_to_na.fa:0
novel_mature_31_10_2016_t_16_31_45_score-50_to_na.fa:26
novel_pres_31_10_2016_t_16_31_45_score-50_to_na.fa:26
novel_star_31_10_2016_t_16_31_45_score-50_to_na.fa:26

This is a follow up on mirDeep2 using bowtie vs. bwa - why do more aligned reads yield less miRNA where we were interested in precision and recall to investigate which recall rate we could expect in a new organism.

Updates:

We are now running the identical pipeline without providing information about known miRNA and the organism. We'd expect to see 227+26 novel miRNAs at score cutoff 0 doing this.
We assume, it has to do with some duplicate sequences in the mature miRNAs. If a miRNA is existing in several copies in miRBase, there might be multiple identical entries, however when predicting de-novo, such miRNAs are only reported once. When miRBase entries are aligned against miDeep predictions, and not vice versa, the copy number > 1 entries count as well and contribute to the difference observed.

RNA-Seq mirDeep2 mirna • 2.1k views

ADD COMMENT • link 7.5 years ago by Michael 54k

0

Entering edit mode

Related post: Mirdeep2 Known Mirna possibly an over 2 year old problem...

ADD REPLY • link 7.5 years ago by Michael 54k

0

Entering edit mode

Indeed, the documentation for miRDeep output files are very little. From miRDeep2 paper (NAR, 2012), the authors said that "For each score cut-off the sensitivity and number of true positive novel miRNAs is estimated". According to this sentence, the number of true positive novel miRNAs is estimated, so the number could no be equal to the number counted from the results actually. This is just my speculation. Waiting for the explanation from the authors.

ADD REPLY • link 6.1 years ago by pengchy ▴ 450