Question: Would you remove rRNA reads in silico before testing for diff. expression?
gravatar for mschmid
9 months ago by
mschmid150 wrote:


Do some of you remove reads originating from rRNA in silico before testing for differential expression?

Why or why not? Is it worth it? Or is it depending on the analysis pipeline in your opinion?

My data is based on poly-A enrichment of mRNA and about 2-5% of the reads (Illumina SE 100bp) are from rRNA operons.

I plan to use HISAT2>stringtie>ballgown workflow as a first strategy to test for DE. I might later try different methods, depending on the ballgown results.

rna-seq • 226 views
ADD COMMENTlink written 9 months ago by mschmid150

As one typically quantifies reads against a transcriptome or GTF file and neither should include rRNA one does not explicitely remove them. Still, as they are not represented in the references, they are not counted anyway. I realize that this workflow you mention is prominent because it was published high by reputable people, I still do not see why one should use it. Stringtie assembles transcriptomes, so unless you really need that I would avoid it. Also ballgown seems to be bulky to me. My preferred pipeline, which is well-maintained is quantification of reads against a transcriptome by salmon, aggregation of transcript abundance estimations to the gene level with tximport and differential analysis with edgeR, whereas the latter can also be done with DESeq2. The mentioned tools have awesome tutorials and developers are responsive to issues at BioC. You might give them a try.

ADD REPLYlink written 9 months ago by ATpoint36k

RNAseq are only side projects to me so I consider myself rather an amateur. However, I use pipelines similar to ATpoint's using STAR as alignment tool and featureCounts for quantification.

My point is to showcase additional options, not comparing this to ATpoint's suggestion. However, I do assist ATpoint in his opinion on the HISAT2 pipeline - I had to implement it for a customer and it feels more clunky than necessary...

ADD REPLYlink modified 9 months ago • written 9 months ago by Carambakaracho2.2k

FYI, 2-5% of sequence coming from rRNA is really high for poly-A enriched data. You should instead expect <<1%.

ADD REPLYlink written 9 months ago by Devon Ryan95k

Hmm... just saw that it is "only" about 1.5-2%. But still not <<1%

ADD REPLYlink written 9 months ago by mschmid150

I think 2% is what you're supposed to get if rRNA depletion works well, rather than due to poly-A enrichment.

ADD REPLYlink written 9 months ago by Devon Ryan95k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 977 users visited in the last hour