I am looking for a tool which may allow me to divide genome into a sliding window and give me a average count of seq reads. Any suggestion please
I am looking for a tool which may allow me to divide genome into a sliding window and give me a average count of seq reads. Any suggestion please
If your sequencing data are in BAM format, the BEDOPS site includes a usage case (with code) where the bedmap
utility is used with its --count
operator to count reads across a sliding window moved across the genome, along with the bam2bed
script to convert the BAM data into BED format.
I looked over the code and you could replace the --count
operator with the --mean
operator to get the arithmetic mean or average reads across said windows. Everything else should be the same (except for the window and step size parameters, which you would likely adjust to your specific use case).
If your data are already in BED format, then you can still use this script, but simply excise the bam2bed
conversion step, which would be unnecessary, and swap out the --count
operator for the --mean
operator, as described above.
You could also excise the starch
compression step, which creates a highly-compressed BED file, which is probably unnecessary unless you want to use this format for archiving BED data more efficiently than naïve gzip or bzip2.
I have extracted some of the R functions I use in a script to calculate an smoothened average, maybe they still work together.
library("GenomicRanges")
lengthsFromStarts <- function(starts){
diff(c(1,starts))
}
meanForChromosome <- function(starts, values, window){
lengths <- lengthsFromStarts(starts)
rle <- Rle(values, lengths)
mean <- runmean(rle, window + 1, endrule = "constant")
as.data.frame(mean)[,1]
}
downSample <- function(chr, meanVec, sampleWindow){
half <- round(sampleWindow/2)
sampling <- seq(half,length(meanVec),by=sampleWindow)
sample2kValues <- meanVec[sampling]
sample2kPositions <- (1:length(meanVec))[sampling]
sampleDF <- data.frame(chr=chr, start=sample2kPositions - half, end=sample2kPositions+(half - 1), values=sample2kValues)
sampleDF
}
writeWig <- function(wigDF, outfile, outname){
f <- file(outfile, "w")
writeLines(paste("track type=bedGraph name=\"",outname,"\" visibility=full",sep=""),f )
write.table(format(wigDF, digits=4, scientific=FALSE), f, sep="\t",col.name=FALSE,row.names=FALSE,quote=FALSE)
close(f)
}
makeMovingAverage <- function(chrs, starts, normedCounts){
wigDF <- data.frame()
for(chr in unique(chrs)){
print(paste("meaning: ", chr))
chrData <- chrs == chr
wig <- meanForChromosome(starts[chrData], normedCounts[chrData], window)
wigSub <- downSample(chr, wig, 2000)
wigDF <- rbind(wigDF, wigSub)
}
writeWig(wigDF, wigname, wigname)
}
If you can get your data in bed format you can use bedtools makewindows to define the genomic ranges that you require and then bedtools map to summarize (mean/median/min/max...) your seq counts on the genomic ranges.
Thanks that is of great help. This is my first post on this forum. Great help!!!!
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Can you explain what you mean by "divide genome into a sliding window"? Have you mapped reads onto a reference sequence?
I have aligned reads to a bacterial refrence genome. I divided genome in bins of 10bp and counted number of reads in each bin. Normalized by total reads. Now for thes bins, I want to generate a sliding window of 100bp and count average of reads in each window and assign to mid point of window. I have all in bed file
I think R will help you with this business. Once you get in R your 10bp bins counts, you can change binwidth to get density with 100bp bins. Here is tutorial for hist function: http://msenux.redwoods.edu/math/R/hist.php. Further you can craft the BED output, sorry, if I am not getting your question.