from a wet lab experiment, an unknown small RNA was detected in a sample. the sample was later filtered for the RNA approximate size and sequenced to try and find out what it is. im now working on this sequence data.
i wanted to ask what you suggest is the best reference to use when aligning and annotating this read data?
i was thinking of a couple of options:
- align to whole genome reference and then annotate the regions with most aligned reads.
- align to ncRNA and see if i get lucky and one of them is the unknown RNA.
- use biomart and get all the unspliced genes sequences with my reads base numbers flank
- use biomart and get all the unspliced transcripts sequences, align to it and then see what transcript is most abundant
any thoughts or suggestions?