Question: Questions about gene length and GC content in CQN normaliztion
gravatar for boymin2020
18 months ago by
boymin202020 wrote:


There are several questions after I read the manual of CQN normalization method. Although I also have checked several related posts of Biostar, but still confused a lot.

  1. How to get the information of gene length? I think it is easy to calculate as the gene bands can be obtained directly from Ensembl website (end bp - start bp + 1?). But it seems that it is more scientific to sum all of the exonic bands for each gene.

  2. How to get the information of GC % content? Unlike gene length, Ensembl website directly gives the GC % contents. But if the gene length is not calculated as I think, they also can not be used.

  3. If no GC bias and gene length bias occur while CQN normalization method is used, what effect will be caused?

  4. Are the residual values after CQN log2-scaled RPM by default?

In sum, I want to know the most exact gene length and GC % content.


ADD COMMENTlink modified 16 months ago by rrbutleriii70 • written 18 months ago by boymin202020
gravatar for rrbutleriii
16 months ago by
US, Chicago
rrbutleriii70 wrote:

See this post for question one and two. Specifically, if all you need is gene length and GC, and you don't want to learn to access biomaRt directly, this will work (but takes a little time depending on the size of you matrix).

library (EDASeq)
ensembl_list <- c("ENSG00000000003","ENSG00000000419","ENSG00000000457","ENSG00000000460")
getGeneLengthAndGCContent(ensembl_list, "hsa")

Question 3: It will still conduct quantile normalization of the data

Question 4: Yes, see the example on page 4 of the vignette.

ADD COMMENTlink modified 16 months ago • written 16 months ago by rrbutleriii70
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 837 users visited in the last hour