Prokaryotic RNA-seq: how to handle ncRNAs in featureCounts and DE analysis?
1
0
Entering edit mode
6 days ago
MaxMin ▴ 10

Hi everyone — I’m analyzing prokaryotic RNA-seq data and I’d like some advice about handling non-coding RNAs (ncRNAs) (e.g. sRNA, tRNA, residual rRNA, etc.).

Should ncRNAs be treated separately from coding genes?

When using featureCounts, should ncRNAs be included in the annotation file or excluded?

For differential expression analysis, is it better to run two separate analyses (coding vs ncRNA) or can I include everything together?

Thanks in advance for any suggestions

ncRNA lcRNA tRNA tmRNA • 526 views
ADD COMMENT
2
Entering edit mode
6 days ago
Gordon Smyth ★ 8.5k

You can analyse everything together, and it is much better to do so.

However, if you are using a standard messenger RNA RNA-seq protocol with polyA pulldown, then some of these RNA species should not be present. Long non-coding RNAs are fine, but short ncRNAs and ribosomal RNA should not be present. In my own analyses, I do filter out species that should not be present but, in most cases, that will make little difference to the analysis.

Coding vs non-coding is not an issue in itself for an RNA-seq analysis, because RNA-seq is analysing RNA expression, not protein expresion.

ADD COMMENT
0
Entering edit mode

Thank you so much, Dr. Smyth!

Could I ask you — if I had received the data sequenced without rRNA depletion, what would be the best way to clean them? I mean, I mapped the reads using the rRNA coordinates from my organism’s GFF3 file to identify and remove rRNA reads/SortMeRna. Is that correct? Are there better methods?

ADD REPLY
1
Entering edit mode

There are many ways to remove rRNA sequences, I like this one:

https://github.com/hzi-bifo/RiboDetector

It doesn't require mapping.

ADD REPLY
1
Entering edit mode

My experience is entirely with eukaryotic organisms, mainly mouse and human, so things might be different in the prokaryotic world.

With mouse or human, it would be extremely unusual to receive RNA-seq data without rRNA depletion or polyA pull down, because it would then be mostly rRNA and barely usuable.

If you had standard RNA-seq with a bit of rRNA, then there's no need to do anything special. You just align and run featureCounts with the GFF in the usual way, then attach standard annotation to the genes, then filter out what you don't want at the analysis stage, in the usual way. I would personally not ever change or manipulate the FASTQ file that I get from the sequencing unit.

ADD REPLY
0
Entering edit mode

Thank you very much for your answers!

ADD REPLY

Login before adding your answer.

Traffic: 3299 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6