Get total trinucleotide counts from a subset of chromosomes
1
0
Entering edit mode
5.7 years ago
nanana ▴ 110

I'm trying to use the trinucleotideFrequency function in the BS.genomes package to calculate trinucleotide frequency over a subset of chromosomes in a genome. I can do this for each chromosomes in the subset individually, but I'm not sure how to do this for all combined. Here's what I'm trying:

trinuc.freqs <- function(genome=NA){
ifis.na(genome)){
cat("No genome specfied, defaulting to 'BSgenome.Dmelanogaster.UCSC.dm6'\n")
library(BSgenome.Dmelanogaster.UCSC.dm6, quietly = TRUE)
genome <- BSgenome.Dmelanogaster.UCSC.dm6
}

seq_genome<-seqnames(genome)[1:7]

for(seqname in seq_genome){
cat(seqname, "\n")
print(trinucleotideFrequency(genome[[seqname]]))
}
}


I've also tried using the bsapply function, but it produces the same (per chromosome) output:

params <- new("BSParams", X = Dmelanogaster, FUN = trinucleotideFrequency, exclude = c("M", "_"))
bsapply(params)


What I want is to have a genome-wide trinucleotide count for a subset of chromosomes, not a per-chromosomes trinucleotide count.

Can anyone point me in the right direction?

R genome • 1.5k views
0
Entering edit mode
5.7 years ago
nanana ▴ 110

At least one way of achieving this is to simplify the output returned from bsapply and sum the rows to give a genome-wide count:

trinuc.freqs <- function(genome=NA){
ifis.na(genome)){
cat("No genome specfied, defaulting to 'BSgenome.Dmelanogaster.UCSC.dm6'\n")
library(BSgenome.Dmelanogaster.UCSC.dm6, quietly = TRUE)
genome <- BSgenome.Dmelanogaster.UCSC.dm6
}

params <- new("BSParams", X = Dmelanogaster, FUN = trinucleotideFrequency, exclude = c("M", "_"), simplify = TRUE)
data<-as.data.frame(bsapply(params))
data\$genome<-as.integer(rowSums(data))
gen_wide <- data['genome']
colnames(gen_wide) <- NULL

return(gen_wide)
}