Hi all –
I am really hoping someone is able to provide simple, step-by-step instructions about how to predict/identify and filter RNAs (degradation products, non-coding, ribosomal, etc.) from a series of small RNA sequencing files. What I’d like is the estimated percentage of reads belonging to each of these categories in each of my files. I am very much a beginner and small RNA-Seq/bioinformatics is not included in my area of even peripheral experience, nor in the experience of any of my colleagues.
A couple points about my situation:
- There are no genomes publically available for these organisms (they are insects so maybe drosophila could be useful)
- The sequencing files I’d like to screen against RFAM are FASTA
files, and have already been aligned to an miRNA database using
bowtie; these files contain the ‘unaligned’ portion.
- Unfortunately, the prediction/filtering has to be done locally (i.e.
no web tools) due to confidentiality concerns surrounding the
In my simplistic view, I’d like to get a FASTA file of the RFAM database, and use bowtie to align the RNAs in my sequencing files to those RNAs. I understand there will be some amount of mismatches that need to be allowed because there is limited or no public information about the organisms I’m trying to work with. My question is, is this a viable approach, and if so how do I do it? If it is not a viable approach, is there a simple approach that can accomplish what I am after, and if so what do I need to do and how do I need to do it?
There have been several posts on this forum with similar questions (see below), but I feel none of them are really close enough to my situation to be helpful (or maybe I’m just not understanding).
I would be incredibly grateful for assistance with this question.
Mapping microRNA reads to Genome and non-coding RNAs (need exactly this type of pie graph, minus the length distribution)