Question: GC content and length bias in RNA-seq data
gravatar for parvin.shariaty
2.6 years ago by
parvin.shariaty0 wrote:

Hi I have seven RNA-seq data in two groups( control,treat) that produced from two companies (controls=BGI,treats=novogen). I used NOISeq to quality control of count data, and the results recommended normalization for GC content and length. But I have studied some papers and have founded that GC content and length normalization in gene expression comparison between two groups introduced bias. So I would like to use normalization factors instead of size factors in DESeq2(as described in differential analysis of count data - the DESeq2 package).

normFactors<- normFactors / exp(rowMeans(log(normFactors)))
normalizationFactors(dds)<- normFactors

Is it a correct manner to normalization and introduce the normalization factors in DESeq2? Thank you for cooperation in advance. Best regards Parvin

rna-seq • 1.0k views
ADD COMMENTlink modified 2.6 years ago by Devon Ryan97k • written 2.6 years ago by parvin.shariaty0

It is not correct to use groups for which the data was generated differently, by a different company in your case. Your groups are confounded by technical differences in addition to the biological effects.

ADD REPLYlink written 2.6 years ago by WouterDeCoster44k

In addition to what Wouter has said, it looks like you are making your situation overly complex. You have 2 different datasets from 2 different platforms. You can analyse these together and include batch/platform as a covariate, but you must exercise caution. I have done this in the past but we eventually decided not to proceed for fear of criticism.

ADD REPLYlink written 2.6 years ago by Kevin Blighe66k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour