Computing 1Gc 2GC 3GC on any CDS
2
0
Entering edit mode
7.2 years ago

Hello,

I was wondering how to compute 1GC, 2GC and 3GC (GC content for each codon at position 1, 2, 3).

I want to compare GC content of any predicted CDS. At the moment, I simply compute with this formula:

GC_content = (G+C)/(A+T+G+C)


For 1GC 2GC 3Gc I tried:

1GC = (G1+C1)/(A1+T1+G1+C1)
2GC = (G2+C2)/(A2+T2+G2+C2)
3GC = (G3+C3)/(A3+T3+G3+C3)


But I'm not confident about this way of computation. I know the formula takes mutation rate into account. I haven't found any software (like a little python script) for now and It's not really difficult to make my own python script if I have the right formula. I would prefer an existing script since it will involved lot of mathematics and probabilities.

In addition, I'm searching a deep review into GC content in prokaryota to really understand how make conclusions from GC content. I'm looking into several articles but I have not found a synthetic review on this subject.

Thanks in advance for any help.

sequence gene • 2.3k views
1
Entering edit mode
7.1 years ago
Chirag Parsania ★ 2.0k

Here is the R function to calculate GC, GC1 ,GC2 or GC3. it takes fasta file as input and choice of function (GC/GC1/GC2/GC3). It will return a vector with GC of each sequence

library("seqinr")
library("Biostrings")
getGC <- function(fastaFile , choice = "GC"){
seqASchar <- lapply(forGC, function(elem){
return(s2c(as.character(elem)))
})
gc_cont2 <- sapply(seqASchar, choice)
return(gc_cont2)
}

0
Entering edit mode

R may be a little bit slow for large-scale analysis. So I suggest not using it in case you have much data to analyze.

0
Entering edit mode
7.2 years ago
Cacau ▴ 520