Question

Prokaryotic RNA-seq: how to handle ncRNAs in featureCounts and DE analysis?

0

Entering edit mode

6 days ago

MaxMin ▴ 10

Hi everyone — I’m analyzing prokaryotic RNA-seq data and I’d like some advice about handling non-coding RNAs (ncRNAs) (e.g. sRNA, tRNA, residual rRNA, etc.).

Should ncRNAs be treated separately from coding genes?

When using featureCounts, should ncRNAs be included in the annotation file or excluded?

For differential expression analysis, is it better to run two separate analyses (coding vs ncRNA) or can I include everything together?

Thanks in advance for any suggestions

ncRNA lcRNA tRNA tmRNA • 526 views

ADD COMMENT • link 4 days ago by MaxMin ▴ 10

score 2 · Answer 1 · 2025-11-02

2

Entering edit mode

6 days ago

Gordon Smyth ★ 8.5k

You can analyse everything together, and it is much better to do so.

However, if you are using a standard messenger RNA RNA-seq protocol with polyA pulldown, then some of these RNA species should not be present. Long non-coding RNAs are fine, but short ncRNAs and ribosomal RNA should not be present. In my own analyses, I do filter out species that should not be present but, in most cases, that will make little difference to the analysis.

Coding vs non-coding is not an issue in itself for an RNA-seq analysis, because RNA-seq is analysing RNA expression, not protein expresion.

ADD COMMENT • link 6 days ago by Gordon Smyth ★ 8.5k

0

Entering edit mode

Thank you so much, Dr. Smyth!

Could I ask you — if I had received the data sequenced without rRNA depletion, what would be the best way to clean them? I mean, I mapped the reads using the rRNA coordinates from my organism’s GFF3 file to identify and remove rRNA reads/SortMeRna. Is that correct? Are there better methods?

ADD REPLY • link 5 days ago by MaxMin ▴ 10

1

Entering edit mode

There are many ways to remove rRNA sequences, I like this one:

https://github.com/hzi-bifo/RiboDetector

It doesn't require mapping.

ADD REPLY • link 5 days ago by Mensur Dlakic ★ 30k

1

Entering edit mode

My experience is entirely with eukaryotic organisms, mainly mouse and human, so things might be different in the prokaryotic world.

With mouse or human, it would be extremely unusual to receive RNA-seq data without rRNA depletion or polyA pull down, because it would then be mostly rRNA and barely usuable.

If you had standard RNA-seq with a bit of rRNA, then there's no need to do anything special. You just align and run featureCounts with the GFF in the usual way, then attach standard annotation to the genes, then filter out what you don't want at the analysis stage, in the usual way. I would personally not ever change or manipulate the FASTQ file that I get from the sequencing unit.