Genomic Ranges of data from a chromosome
Entering edit mode
11 weeks ago

Hello, I'm fairly new to GenomicRanges that I'm discovering. Very exciting !

I will need to calculate statistics on a full chromosome (let's say chr22) : how much CG, how many patterns in windows.. Therefore I'll need info about the position and the sequence. I was wondering whether it was already included somewhere(without me finding it so far) a function to create this based on the structure of a chromosome (length, strands, and maybe base..) or if it needed to create it myself by creating a GRanges object and create the data?

Thank you for your time

chromosome DNA GenomicRanges • 126 views
Entering edit mode
11 weeks ago
benformatics ★ 2.3k

Short answer: Yes you would need to create a lot of these functions yourself - however there are other packages that should make this easier for you.

You should also check out library(Biostrings) which should allow you to look for specific nt composition but also match different motifs through vmatchPattern/matchPattern

Here is a GC content calculator script ripped from some old bioC material:

gcContent <- function(x)
alf <- alphabetFrequency(x, as.prob=TRUE)
sum(alf[c("G", "C")])

You can use it as such:

## get your genome in Biostrings format 
> gcContent(Hsapiens[["chr1"]])
[1] 0.4174393

And here is probably a great resource for becoming familiar with the DNAStringSet objects:


Login before adding your answer.

Traffic: 2175 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6