Hi everyone — I’m analyzing prokaryotic RNA-seq data and I’d like some advice about handling non-coding RNAs (ncRNAs) (e.g. sRNA, tRNA, residual rRNA, etc.).
Should ncRNAs be treated separately from coding genes?
When using featureCounts, should ncRNAs be included in the annotation file or excluded?
For differential expression analysis, is it better to run two separate analyses (coding vs ncRNA) or can I include everything together?
Thanks in advance for any suggestions
Thank you so much, Dr. Smyth!
Could I ask you — if I had received the data sequenced without rRNA depletion, what would be the best way to clean them? I mean, I mapped the reads using the rRNA coordinates from my organism’s GFF3 file to identify and remove rRNA reads/SortMeRna. Is that correct? Are there better methods?
There are many ways to remove rRNA sequences, I like this one:
https://github.com/hzi-bifo/RiboDetector
It doesn't require mapping.
My experience is entirely with eukaryotic organisms, mainly mouse and human, so things might be different in the prokaryotic world.
With mouse or human, it would be extremely unusual to receive RNA-seq data without rRNA depletion or polyA pull down, because it would then be mostly rRNA and barely usuable.
If you had standard RNA-seq with a bit of rRNA, then there's no need to do anything special. You just align and run featureCounts with the GFF in the usual way, then attach standard annotation to the genes, then filter out what you don't want at the analysis stage, in the usual way. I would personally not ever change or manipulate the FASTQ file that I get from the sequencing unit.
Thank you very much for your answers!