In our bulk mRNA-Seq data, about ~600 of our ~21,000 detected genes were miRNAs. All of these fall within the bounds of expression of non-miRNA genes, and about ~20 miRNAs fall within the upper half of gene expression in the dataset.
I was surprised by this because I thought most miRNAs would be removed via polyA selection. Also, we did NOT use a kit for small RNA capture and sequencing.
I'm wondering if these reads are aligning to a pre-miRNA that are longer than 75 basepairs. I'd like it if I could take my list of miRNAs in R and combine them with a database that has information about how long the pre- and post-processed miRNAs are, to sanity check my theory.
Does such an miRNA database exist? I'm trying to use mirbase.db, but I'm confused about its use, as well as whether it has the information I'm looking for: http://bioconductor.org/packages/release/data/annotation/html/mirbase.db.html