Question: CpG islands density calculation
2.0 years ago
Lila M
wrote:

Hi everybody, I downloaded the promoter sequences fortwo gene list using USCS, so I have all the fasta files stored in a txt file (file_a and file_b). I would like to know if there is any difference for the CpG in both files. To do that, I've performed a little code in R

fastaFile_a = readDNAStringSet("file_a")
#seq_name_a = names(fastaFile_a)
#sequence_a = paste(fastaFile_a)
CG_file_a = sum(vcountPattern("CG", fastaFile_a))

fastaFile_b = readDNAStringSet("file_b")
CG_file_b =sum(vcountPattern("CG", fastaFile_b))

I'm not feel very confident at it, because I'm not sure the accuracy to identify CpG density properly... any idea or suggestion?


cpg rna-seq promoters • 1.0k views
written 2.0 years ago by Lila M

Two quick notes:

You should probably normalise for lengths.

Are these sequences directional? Should you include an inverse pattern of "GC", given DNA is double stranded.

written 2.0 years ago by jotan

Hi, as all the sequences have the same length (1,000 nt) I don't have to normalize for length. I downloaded the sequences for USCS, how can I know if they are directional? Thanks for the tips!

written 2.0 years ago by Lila M
