Question: Would you remove rRNA reads in silico before testing for diff. expression?
0
gravatar for mschmid
5 weeks ago by
mschmid110
Switzerland
mschmid110 wrote:

Hello

Do some of you remove reads originating from rRNA in silico before testing for differential expression?

Why or why not? Is it worth it? Or is it depending on the analysis pipeline in your opinion?

My data is based on poly-A enrichment of mRNA and about 2-5% of the reads (Illumina SE 100bp) are from rRNA operons.

I plan to use HISAT2>stringtie>ballgown workflow as a first strategy to test for DE. I might later try different methods, depending on the ballgown results.

rna-seq • 110 views
ADD COMMENTlink written 5 weeks ago by mschmid110
2

As one typically quantifies reads against a transcriptome or GTF file and neither should include rRNA one does not explicitely remove them. Still, as they are not represented in the references, they are not counted anyway. I realize that this workflow you mention is prominent because it was published high by reputable people, I still do not see why one should use it. Stringtie assembles transcriptomes, so unless you really need that I would avoid it. Also ballgown seems to be bulky to me. My preferred pipeline, which is well-maintained is quantification of reads against a transcriptome by salmon, aggregation of transcript abundance estimations to the gene level with tximport and differential analysis with edgeR, whereas the latter can also be done with DESeq2. The mentioned tools have awesome tutorials and developers are responsive to issues at BioC. You might give them a try.

ADD REPLYlink written 5 weeks ago by ATpoint24k
1

RNAseq are only side projects to me so I consider myself rather an amateur. However, I use pipelines similar to ATpoint's using STAR as alignment tool and featureCounts for quantification.

My point is to showcase additional options, not comparing this to ATpoint's suggestion. However, I do assist ATpoint in his opinion on the HISAT2 pipeline - I had to implement it for a customer and it feels more clunky than necessary...

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Carambakaracho1.8k

FYI, 2-5% of sequence coming from rRNA is really high for poly-A enriched data. You should instead expect <<1%.

ADD REPLYlink written 5 weeks ago by Devon Ryan92k

Hmm... just saw that it is "only" about 1.5-2%. But still not <<1%

ADD REPLYlink written 5 weeks ago by mschmid110

I think 2% is what you're supposed to get if rRNA depletion works well, rather than due to poly-A enrichment.

ADD REPLYlink written 5 weeks ago by Devon Ryan92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1944 users visited in the last hour