Hello, I'm fairly new to GenomicRanges that I'm discovering. Very exciting !

I will need to calculate statistics on a full chromosome (let's say chr22) : how much CG, how many patterns in windows.. Therefore I'll need info about the position and the sequence. I was wondering whether it was already included somewhere(without me finding it so far) a function to create this based on the structure of a chromosome (length, strands, and maybe base..) or if it needed to create it myself by creating a GRanges object and create the data?

Thank you for your time

Short answer: Yes you would need to create a lot of these functions yourself - however there are other packages that should make this easier for you.

You should also check out library(Biostrings) which should allow you to look for specific nt composition but also match different motifs through vmatchPattern/matchPattern

Here is a GC content calculator script ripped from some old bioC material:

gcContent <- function(x)
alf <- alphabetFrequency(x, as.prob=TRUE)
sum(alf[c("G", "C")])

You can use it as such:

## get your genome in Biostrings format 
> gcContent(Hsapiens[["chr1"]])
[1] 0.4174393

And here is probably a great resource for becoming familiar with the DNAStringSet objects:


