Question: HIGH BCV in EdgeR, any ideas?
0
gravatar for Biogeek
2.6 years ago by
Biogeek350
Biogeek350 wrote:

I've got this difficult RNA-seq de novo dataset, and despite removing contamination from my samples and performing the experiment to the best of my ability I'm still getting a BCV of 0.6 for my experiment. I've been told that this result is 'bad' and that I can't publish with a high BCV. Can someone comment on this? This species has a genome which is 3/4 complete, if i align my reads to that, my BCV is 0.2 with no prior filtering.

It seems in the de novo quite a proportion of my genes show variability across replicates and the samples are quite heterogeneous. We even conducted physiological measurements pre-experiment to ensure they were all at a suitable level of acclimation. All other parameters were tightly controlled to make the experiment stringent and fair.

RNA was extracted using uniform method, and at the same time to prevent batch effects. I have applied TMM in EdgeR as some library sizes were double of others , and used a cut-off of at least 1CPM in at least 3 samples for a gene to be taken forward for analysis.

I tried looking at the variable genes with low prior.df's; however they seem to be random genes and there's no obvious patterns emerging.

Any ideas on why the de novo has such a high BCV but the genome aligned version has a nice low value? The de novo is made of several assemblies merged including the genome model genes and clustered into non redundant transcripts.

Thanks.

high dispersion edger glm bcv • 1.0k views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Biogeek350
1

Are you quantifying genes in one method and transcripts in the other? I imagine that quantifying transcripts with non-optimal methods will lead to higher BCVs.

ADD REPLYlink written 2.6 years ago by Devon Ryan89k

Predicted gene models in genome guided. Evidential genes assembled de novo in the second. Any tips?

ADD REPLYlink written 2.6 years ago by Biogeek350

My suspicion is that this is some quirk of how the alignments and counts are working with your assembly. I wouldn't know what's going funky there, but that's where you should be looking.

ADD REPLYlink written 2.6 years ago by Devon Ryan89k
0
gravatar for Biogeek
2.6 years ago by
Biogeek350
Biogeek350 wrote:

Any knowledge out there? :-) What could be causing the huge difference between genome gene model based counts and the de novo?

ADD COMMENTlink written 2.6 years ago by Biogeek350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1213 users visited in the last hour