Question: Is removing rRNA a necessary step in RNA-seq?
0
gravatar for 2822462298
10 months ago by
282246229850
282246229850 wrote:

I used SortMeRNA to remove rRNA sequences in my raw RNA-seq data. I got ~95% clean data for 7 out of 8 samples. For the remaining one, I only got ~75%..., around 20% was mapped to the eukaryotic 18s and 28s sequence. Later in the differential expression analysis, the wired sample appeared to be an outgroup in the PCA plot and it cannot be clustered with other replicated samples.

Therefore, I may have to discard this sample in my DE analysis. But I may also skip the rRNA removal step so that it will not cause the problem...What should I do?

ADD COMMENTlink modified 10 months ago by Devon Ryan97k • written 10 months ago by 282246229850
2
gravatar for Devon Ryan
10 months ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

There's no reason to bother removing rRNA if (like most people) you're not quantifying it later. Usually one just looks at the percentage of reads in feature (e.g., with multiQC on the featureCounts output) and excludes outlier samples. That won't tell you that a sample was an outlier due to rRNA contamination, but that's rarely actionable information in and of itself (you'd still want to see it as an outlier in PCA).

ADD COMMENTlink written 10 months ago by Devon Ryan97k
2

Also since you generally have some residual rRNA "contamination" even after poly-A selection ... you could be throwing off normalization factors that take into account your library size.

ADD REPLYlink written 10 months ago by benformatics2.0k

So this means even after counting one should not remove rRNA genes? e.g HISAT2-->FeatureCounts-->DESeq2

ADD REPLYlink written 4 months ago by DataFanatic270

Yep exactly, you can filter them out at the end from the DE genes if they are not interesting to you.

ADD REPLYlink written 4 months ago by benformatics2.0k
1

No, if you don't care about them then you should remove the counts from the matrix. Otherwise you're needless inflating the tests you're doing and deflating your power. The normalization should be robust to their presence, but if there's a LOT of rRNA contamination in one sample then that tends to cause issues with the normalization factors.

ADD REPLYlink written 4 months ago by Devon Ryan97k
1

Yeah I was thinking they should keep them in for the size factors calculation but then it would be ok to remove them. But checking through the DESeq2 manual it didn't seem very obvious as to how to do that. In edgeR, it is a little more straightforward...

I think dropping them off the bat would only be OK if you checked that they were similar across samples.

ADD REPLYlink modified 4 months ago • written 4 months ago by benformatics2.0k

Thank you, Devon. Yes, this is one of the reasons would prefer to remove them only I have not been able to find the gene IDs. Any idea where I can get the list of Drosophila Melanogaster rDNA gene ENSEMBL IDs?

ADD REPLYlink written 4 months ago by DataFanatic270
1

So you did not count them in first place? Here is Ensembl Drosophila rDNA scaffold. Same scaffold at flybase.

ADD REPLYlink modified 4 months ago • written 4 months ago by GenoMax92k

Thanks, genomax. You filter them from the raw counts before DESeq2. The scaffold works for alignment but at this point, I already have the counts matrix how can I obtain just the gene IDs?

ADD REPLYlink written 4 months ago by DataFanatic270
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2286 users visited in the last hour