Several questions about miRDeep2
Hi all, I am currently using miRDeep2 to identify miRNAs in a non-model organism (a butterfly). After running to map the small RNA reads to the genome, I used to predict novel miRNAs of the species. Here are some of my questions:

  1. I have 8 samples with 2 conditions (4 replicates for each). Do I run for each sample individually and select those shared among samples as the novel miRNAs?
  2. There are no previously reported reference miRNAs or hairpins in miRBase for my species. When I set them as 'none' and include the mature sequence of the related species (all Lepidoptera + drosophila), the outputs are all classified as 'novel'. But in this case, I do not know how many of these miRNAs are conserved (or share homology) in other species. Then I tried to include all metazoa miRNAs&hairpins or all Lepidoptera miRNAs&hairpins as my 'reference miRNAs&hairpins of my species' (of course they are not). In each case, I got the different numbers of predicted miRNAs and known miRNAs (mapped to either metazoa or Lepidoptera)...
  3. What are the criteria to select true-positive miRNAs from all predicted miRNAs? Based on my understanding I should choose those with significant randfold p-value (labeled as 'yes') and those with high miRDeep2 score. It says that the range of miRDeep2 score is from -10 to 10, but I got many extremely high scores up to 1.8e+6...Why is that...Also, there are miRNAs with very high miRDeep2 scores but the randfold p-values are not significant. Do I consider them as true positive as well?
  4. How to deal with precursors showing substantial sequence redundancy? There are many identical miRNA loci from different chromosomal locations. Since I will do differential expression analysis of mature sequences, I need to exclude the extra loci in the downstream analysis. Which one of those loci should I choose as the representative? Do I have to manually look for the redundant loci and modify them?
