Questions about gene length and GC content in CQN normaliztion
1
1
Entering edit mode
3.0 years ago
boymin2020 ▴ 30

Hi,

There are several questions after I read the manual of CQN normalization method. Although I also have checked several related posts of Biostar, but still confused a lot.

1. How to get the information of gene length? I think it is easy to calculate as the gene bands can be obtained directly from Ensembl website (end bp - start bp + 1?). But it seems that it is more scientific to sum all of the exonic bands for each gene.

2. How to get the information of GC % content? Unlike gene length, Ensembl website directly gives the GC % contents. But if the gene length is not calculated as I think, they also can not be used.

3. If no GC bias and gene length bias occur while CQN normalization method is used, what effect will be caused?

4. Are the residual values after CQN log2-scaled RPM by default?

In sum, I want to know the most exact gene length and GC % content.

Thanks,

CQN RNA-Seq gene length GC content • 1.5k views
0
Entering edit mode
2.8 years ago
rrbutleriii ▴ 110

See this post for question one and two. Specifically, if all you need is gene length and GC, and you don't want to learn to access biomaRt directly, this will work (but takes a little time depending on the size of you matrix).

library (EDASeq)
ensembl_list <- c("ENSG00000000003","ENSG00000000419","ENSG00000000457","ENSG00000000460")
getGeneLengthAndGCContent(ensembl_list, "hsa")


Question 3: It will still conduct quantile normalization of the data

Question 4: Yes, see the example on page 4 of the vignette.

Traffic: 2896 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.