CpG islands density calculation
0
0
Entering edit mode
4.2 years ago
Lila M ▴ 920

Hi everybody, I downloaded the promoter sequences fortwo gene list using USCS, so I have all the fasta files stored in a txt file (file_a and file_b). I would like to know if there is any difference for the CpG in both files. To do that, I've performed a little code in R

fastaFile_a = readDNAStringSet("file_a")
#seq_name_a = names(fastaFile_a)
#sequence_a = paste(fastaFile_a)
CG_file_a = sum(vcountPattern("CG", fastaFile_a))

CG_file_b =sum(vcountPattern("CG", fastaFile_b))


I'm not feel very confident at it, because I'm not sure the accuracy to identify CpG density properly... any idea or suggestion?

Thank!

RNA-Seq CpG promoters • 1.9k views
1
Entering edit mode

Two quick notes:

You should probably normalise for lengths.

Are these sequences directional? Should you include an inverse pattern of "GC", given DNA is double stranded.

0
Entering edit mode

Hi, as all the sequences have the same length (1,000 nt) I don't have to normalize for length. I downloaded the sequences for USCS, how can I know if they are directional? Thanks for the tips!