Genomic Ranges of data from a chromosome
1
0
Entering edit mode
3.0 years ago

Hello, I'm fairly new to GenomicRanges that I'm discovering. Very exciting !

I will need to calculate statistics on a full chromosome (let's say chr22) : how much CG, how many patterns in windows.. Therefore I'll need info about the position and the sequence. I was wondering whether it was already included somewhere(without me finding it so far) a function to create this based on the structure of a chromosome (length, strands, and maybe base..) or if it needed to create it myself by creating a GRanges object and create the data?

Thank you for your time

chromosome DNA GenomicRanges • 680 views
ADD COMMENT
0
Entering edit mode
3.0 years ago

Short answer: Yes you would need to create a lot of these functions yourself - however there are other packages that should make this easier for you.

You should also check out library(Biostrings) which should allow you to look for specific nt composition but also match different motifs through vmatchPattern/matchPattern

Here is a GC content calculator script ripped from some old bioC material:

gcContent <- function(x)
{
alf <- alphabetFrequency(x, as.prob=TRUE)
sum(alf[c("G", "C")])
}

You can use it as such:

## get your genome in Biostrings format 
 library(BSgenome.Hsapiens.UCSC.hg19)
> gcContent(Hsapiens[["chr1"]])
[1] 0.4174393

And here is probably a great resource for becoming familiar with the DNAStringSet objects: https://uclouvain-cbio.github.io/WSBIM1322/sec-biostrings.html

ADD COMMENT

Login before adding your answer.

Traffic: 1480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6