Question: Ajdust for GC content bias in RNA-seq DE analysis.
2
gravatar for statfa
2.2 years ago by
statfa450
statfa450 wrote:

When I was reading this paper:

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-480

I realized that even though it was believed that "for a given gene, the GC-content effect was the same across samples and hence would cancel out when considering DE statistics such as count ratios.", but now, this belief is disputed and they say, biases due to GC content should be normalized before DE analysis.

Now, I have a table of raw read counts. I analyzed the data without controlling for the GC content. I know that such effect can be absorbed into sample specific sequencing depth if only a single sample is sequenced in each lane. My data comes from an experiment in which two samples have been sequenced in each lane. How can I normalize the data if all I have is the table of raw read counts? Is it ok if I don't adjust the effect of GC content and normalize my data only for sequencing depth bias?

gc content normalization • 1.3k views
ADD COMMENTlink modified 2.2 years ago by Devon Ryan91k • written 2.2 years ago by statfa450
2

DESeq2 accounts for this. I assume other packages may as well.

ADD REPLYlink written 2.2 years ago by genomax71k
2

I always assumed they don't because you're only comparing genes against each other, with the same GC content... I don't think those tools take GC content into account by default. They're also agnostic about those features, they only have counts as input...

ADD REPLYlink written 2.2 years ago by WouterDeCoster40k

Yeah, I thought the same as you did but when I read that paper, I realized that it's essential to adjust for the GC content bias. Could anyone show me some papers where they suggest it's not essential to account for GC content bias please?

ADD REPLYlink written 2.2 years ago by statfa450
1

There haven't been GC-bias issues for the last ~5 years. You're not going to find a paper about that, no one would bother writing it.

ADD REPLYlink written 2.2 years ago by Devon Ryan91k
1

Thank you. Can that package control the GC content bias when you only have the table of read counts?

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by statfa450
3
gravatar for Devon Ryan
2.2 years ago by
Devon Ryan91k
Freiburg, Germany
Devon Ryan91k wrote:

It is rarely necessary to account for GC bias, since it's rare these days for there to be a GC bias between samples. If you're worried about that, you can use the CQN package (from bioconductor) with DESeq2.

ADD COMMENTlink written 2.2 years ago by Devon Ryan91k

Yeah, I remember you once told me that. But when I read these two papers, it seems that it's essential to use within lane normalizations for GC content.

https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-480

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4917940/

The problem is that now that I have finished my analyses and am about to present my results, I've realized that GC content bias should normalized. I don't have enough time to normalize the data for GC content if it's possible to normalize them, and repeat the analysis. That's why I asked this question here to ask for some references to mention as a reason to why I haven't accounted for the GC content bias.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by statfa450
4

You haven't realized that GC content should be normalized, you simply think it needs to be. If you can't show a GC bias between samples then you have no indication that it should be normalized (there'd be nothing to normalize).

ADD REPLYlink written 2.2 years ago by Devon Ryan91k

What's the threshold for considering a GC bias? If I have a control group of samples at 44% (+/- 2) GC compared to my treated group that is 41% (+/- 3), is that bias or within normal? Having a hard time finding an answer.

ADD REPLYlink written 22 months ago by annen30

A single number doesn't exist. As I mentioned in your original thread, use the cqn package to make a diagnostic plot. If the distributions in that are quite different then you need to correct for it. Otherwise, you might end up correcting out a difference in expression of only a couple transcripts.

ADD REPLYlink written 22 months ago by Devon Ryan91k

Okay, I finally managed to do that and it looks like I do in fact need to correct for GC bias?

https://ibb.co/gkBxyG https://ibb.co/g0RBsb

ADD REPLYlink modified 22 months ago • written 22 months ago by annen30

Yeah, it looks like it'll benefit you.

ADD REPLYlink written 22 months ago by Devon Ryan91k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 944 users visited in the last hour