Question

Assigning small RNA reads

0

Entering edit mode

3.0 years ago

kb_93 ▴ 10

Hello there!

I am doing small RNA seq analysis and I have been using featureCounts to assign reads. There was high % of unassigned ambiguity and when I looked into it further into these I can see that many of the reads that had been denoted as ambiguous could be annotated either as tRNA and piRNA or snoRNA and piRNA (using the -O flag in featureCounts to allow overlap).

I have looked at a few of the small RNA annotation tools that are available and they usually annotate reads in a certain order (e.g. miRNA - rRNA - tRNA- snoRNA - piRNA). Within the tools documentation, I haven't been able to find out why they annotate reads in a certain order, would anyone be able to explain this or suggest anything online I could use to understand it?

Any help will be greatly appreciated!

Katie

featureCount smallRNA rna rna-seq sequencing • 958 views

ADD COMMENT • link updated 3.0 years ago by Carlo Yague 8.7k • written 3.0 years ago by kb_93 ▴ 10

0

Entering edit mode

they usually annotate reads in a certain order

What do you exactly mean by that ? That in the case of ambiguous mapping, they assign the reads in priority to miRNA, then rRNA, etc... ?

If it is so, it is probably a matter of probability: in a sRNA-seq experiment, a read that can be assigned to both miRNA sequence and something else is more likely to come from the miRNA (because of size selection). Then, rRNA come second as they are super abundant, etc...

ADD REPLY • link 3.0 years ago by Carlo Yague 8.7k

0

Entering edit mode

Hi Carlo,

Thank you for your reply. Perhaps I worded it wrong but yes it was the order of priority I'm unsure about. I'm not which of tRNA, snoRNA or piRNA would come first?

ADD REPLY • link 3.0 years ago by kb_93 ▴ 10

1

Entering edit mode

I think that you need to take a step back here. Your main issue is ambiguous mapping, because of the short read length and repetitive nature of the "sRNA-ome". Prioritization and iterative mapping is only one way to solve this issue. Actually, your question highlight one of the reason why the prioritization method is suboptimal, since the assumption you can make on your data are limited and not always transferable (is a sRNA read more likely to come from piRNA locus or snoRNA gene ? I have no idea.)

For starter, I suggest that you read this recent paper, that summarizes well the issue of multimapping in sRNA-seq and propose a different solution than prioritization (rescue). Manatee: detection and quantification of small non-coding RNAs from next-generation sequencing data

ADD REPLY • link 3.0 years ago by Carlo Yague 8.7k