Hi all, I am currently using miRDeep2 to identify miRNAs in a non-model organism (a butterfly). After running mapper.pl to map the small RNA reads to the genome, I used miRDeep2.pl to predict novel miRNAs of the species. Here are some of my questions:
- I have 8 samples with 2 conditions (4 replicates for each). Do I run miRDeep2.pl for each sample individually and select those shared among samples as the novel miRNAs?
- There are no previously reported reference miRNAs or hairpins in miRBase for my species. When I set them as 'none' and include the mature sequence of the related species (all Lepidoptera + drosophila), the outputs are all classified as 'novel'. But in this case, I do not know how many of these miRNAs are conserved (or share homology) in other species. Then I tried to include all metazoa miRNAs&hairpins or all Lepidoptera miRNAs&hairpins as my 'reference miRNAs&hairpins of my species' (of course they are not). In each case, I got the different numbers of predicted miRNAs and known miRNAs (mapped to either metazoa or Lepidoptera)...
- What are the criteria to select true-positive miRNAs from all predicted miRNAs? Based on my understanding I should choose those with significant randfold p-value (labeled as 'yes') and those with high miRDeep2 score. It says that the range of miRDeep2 score is from -10 to 10, but I got many extremely high scores up to 1.8e+6...Why is that...Also, there are miRNAs with very high miRDeep2 scores but the randfold p-values are not significant. Do I consider them as true positive as well?
- How to deal with precursors showing substantial sequence redundancy? There are many identical miRNA loci from different chromosomal locations. Since I will do differential expression analysis of mature sequences, I need to exclude the extra loci in the downstream analysis. Which one of those loci should I choose as the representative? Do I have to manually look for the redundant loci and modify them?