Hi everybody, I downloaded the promoter sequences fortwo gene list using USCS, so I have all the fasta files stored in a txt file (file_a and file_b). I would like to know if there is any difference for the CpG in both files. To do that, I've performed a little code in R
fastaFile_a = readDNAStringSet("file_a") #seq_name_a = names(fastaFile_a) #sequence_a = paste(fastaFile_a) CG_file_a = sum(vcountPattern("CG", fastaFile_a)) fastaFile_b = readDNAStringSet("file_b") CG_file_b =sum(vcountPattern("CG", fastaFile_b))
I'm not feel very confident at it, because I'm not sure the accuracy to identify CpG density properly... any idea or suggestion?