Question

Sliding Window And Average - A Tool

1

Entering edit mode

12.6 years ago

kanwarjag ★ 1.2k

I am looking for a tool which may allow me to divide genome into a sliding window and give me a average count of seq reads. Any suggestion please

sliding-window • 11k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 12.6 years ago by kanwarjag ★ 1.2k

0

Entering edit mode

Can you explain what you mean by "divide genome into a sliding window"? Have you mapped reads onto a reference sequence?

ADD REPLY • link 12.6 years ago by Josh Herr 5.8k

0

Entering edit mode

I have aligned reads to a bacterial refrence genome. I divided genome in bins of 10bp and counted number of reads in each bin. Normalized by total reads. Now for thes bins, I want to generate a sliding window of 100bp and count average of reads in each window and assign to mid point of window. I have all in bed file

ADD REPLY • link 12.6 years ago by kanwarjag ★ 1.2k

0

Entering edit mode

I think R will help you with this business. Once you get in R your 10bp bins counts, you can change binwidth to get density with 100bp bins. Here is tutorial for hist function: http://msenux.redwoods.edu/math/R/hist.php. Further you can craft the BED output, sorry, if I am not getting your question.

ADD REPLY • link 12.6 years ago by Pavel Senin ★ 1.9k

score 4 · Answer 1 · 2012-12-10

If your sequencing data are in BAM format, the BEDOPS site includes a usage case (with code) where the bedmap utility is used with its --count operator to count reads across a sliding window moved across the genome, along with the bam2bed script to convert the BAM data into BED format.

I looked over the code and you could replace the --count operator with the --mean operator to get the arithmetic mean or average reads across said windows. Everything else should be the same (except for the window and step size parameters, which you would likely adjust to your specific use case).

If your data are already in BED format, then you can still use this script, but simply excise the bam2bed conversion step, which would be unnecessary, and swap out the --count operator for the --mean operator, as described above.

You could also excise the starch compression step, which creates a highly-compressed BED file, which is probably unnecessary unless you want to use this format for archiving BED data more efficiently than naïve gzip or bzip2.

score 2 · Answer 2 · 2012-12-10

I have extracted some of the R functions I use in a script to calculate an smoothened average, maybe they still work together.

library("GenomicRanges")

lengthsFromStarts <- function(starts){
        diff(c(1,starts))
}

meanForChromosome <- function(starts, values, window){
        lengths <- lengthsFromStarts(starts)
        rle <- Rle(values, lengths)
        mean <- runmean(rle, window + 1, endrule =  "constant")
        as.data.frame(mean)[,1]
}

downSample <- function(chr, meanVec, sampleWindow){
        half <- round(sampleWindow/2)
        sampling <- seq(half,length(meanVec),by=sampleWindow)
        sample2kValues <- meanVec[sampling]
        sample2kPositions <- (1:length(meanVec))[sampling]
        sampleDF <- data.frame(chr=chr, start=sample2kPositions - half, end=sample2kPositions+(half - 1), values=sample2kValues)
        sampleDF
}

writeWig <- function(wigDF, outfile, outname){ 
        f <- file(outfile, "w")
        writeLines(paste("track type=bedGraph name=\"",outname,"\" visibility=full",sep=""),f )
        write.table(format(wigDF, digits=4, scientific=FALSE), f, sep="\t",col.name=FALSE,row.names=FALSE,quote=FALSE)
        close(f)
}

makeMovingAverage <- function(chrs, starts, normedCounts){
        wigDF <- data.frame()
        for(chr in unique(chrs)){
                print(paste("meaning: ", chr))
                chrData <- chrs == chr
                wig <- meanForChromosome(starts[chrData], normedCounts[chrData], window)
                wigSub <- downSample(chr, wig, 2000)
                wigDF <- rbind(wigDF, wigSub)
        }
        writeWig(wigDF, wigname, wigname)
}

score 1 · Answer 3 · 2012-12-10

1

Entering edit mode

12.6 years ago

Irsan ★ 7.8k

If you can get your data in bed format you can use bedtools makewindows to define the genomic ranges that you require and then bedtools map to summarize (mean/median/min/max...) your seq counts on the genomic ranges.

ADD COMMENT • link 12.6 years ago by Irsan ★ 7.8k

score 0 · Answer 4 · 2012-12-10

0

Entering edit mode

12.6 years ago

kanwarjag ★ 1.2k

Thanks that is of great help. This is my first post on this forum. Great help!!!!

ADD COMMENT • link 12.6 years ago by kanwarjag ★ 1.2k

3

Entering edit mode

You don't say thanks in an answer. you a) vote/accept b) say thanks in an comment

ADD REPLY • link 12.6 years ago by Ido Tamir 5.2k