Sliding Window And Average - A Tool
4
1
Entering edit mode
11.4 years ago
kanwarjag ★ 1.2k

I am looking for a tool which may allow me to divide genome into a sliding window and give me a average count of seq reads. Any suggestion please

sliding-window • 10k views
ADD COMMENT
0
Entering edit mode

Can you explain what you mean by "divide genome into a sliding window"? Have you mapped reads onto a reference sequence?

ADD REPLY
0
Entering edit mode

I have aligned reads to a bacterial refrence genome. I divided genome in bins of 10bp and counted number of reads in each bin. Normalized by total reads. Now for thes bins, I want to generate a sliding window of 100bp and count average of reads in each window and assign to mid point of window. I have all in bed file

ADD REPLY
0
Entering edit mode

I think R will help you with this business. Once you get in R your 10bp bins counts, you can change binwidth to get density with 100bp bins. Here is tutorial for hist function: http://msenux.redwoods.edu/math/R/hist.php. Further you can craft the BED output, sorry, if I am not getting your question.

ADD REPLY
4
Entering edit mode
11.4 years ago

If your sequencing data are in BAM format, the BEDOPS site includes a usage case (with code) where the bedmap utility is used with its --count operator to count reads across a sliding window moved across the genome, along with the bam2bed script to convert the BAM data into BED format.

I looked over the code and you could replace the --count operator with the --mean operator to get the arithmetic mean or average reads across said windows. Everything else should be the same (except for the window and step size parameters, which you would likely adjust to your specific use case).

If your data are already in BED format, then you can still use this script, but simply excise the bam2bed conversion step, which would be unnecessary, and swap out the --count operator for the --mean operator, as described above.

You could also excise the starch compression step, which creates a highly-compressed BED file, which is probably unnecessary unless you want to use this format for archiving BED data more efficiently than naïve gzip or bzip2.

ADD COMMENT
2
Entering edit mode
11.4 years ago
Ido Tamir 5.2k

I have extracted some of the R functions I use in a script to calculate an smoothened average, maybe they still work together.

library("GenomicRanges")

lengthsFromStarts <- function(starts){
        diff(c(1,starts))
}

meanForChromosome <- function(starts, values, window){
        lengths <- lengthsFromStarts(starts)
        rle <- Rle(values, lengths)
        mean <- runmean(rle, window + 1, endrule =  "constant")
        as.data.frame(mean)[,1]
}

downSample <- function(chr, meanVec, sampleWindow){
        half <- round(sampleWindow/2)
        sampling <- seq(half,length(meanVec),by=sampleWindow)
        sample2kValues <- meanVec[sampling]
        sample2kPositions <- (1:length(meanVec))[sampling]
        sampleDF <- data.frame(chr=chr, start=sample2kPositions - half, end=sample2kPositions+(half - 1), values=sample2kValues)
        sampleDF
}

writeWig <- function(wigDF, outfile, outname){ 
        f <- file(outfile, "w")
        writeLines(paste("track type=bedGraph name=\"",outname,"\" visibility=full",sep=""),f )
        write.table(format(wigDF, digits=4, scientific=FALSE), f, sep="\t",col.name=FALSE,row.names=FALSE,quote=FALSE)
        close(f)
}

makeMovingAverage <- function(chrs, starts, normedCounts){
        wigDF <- data.frame()
        for(chr in unique(chrs)){
                print(paste("meaning: ", chr))
                chrData <- chrs == chr
                wig <- meanForChromosome(starts[chrData], normedCounts[chrData], window)
                wigSub <- downSample(chr, wig, 2000)
                wigDF <- rbind(wigDF, wigSub)
        }
        writeWig(wigDF, wigname, wigname)
}
ADD COMMENT
0
Entering edit mode

Could you please show a small sample dataset for this code? I had some issue to understand it. Thank you very much!

ADD REPLY
1
Entering edit mode
11.4 years ago
Irsan ★ 7.8k

If you can get your data in bed format you can use bedtools makewindows to define the genomic ranges that you require and then bedtools map to summarize (mean/median/min/max...) your seq counts on the genomic ranges.

ADD COMMENT
0
Entering edit mode
11.4 years ago
kanwarjag ★ 1.2k

Thanks that is of great help. This is my first post on this forum. Great help!!!!

ADD COMMENT
3
Entering edit mode

You don't say thanks in an answer. you a) vote/accept b) say thanks in an comment

ADD REPLY

Login before adding your answer.

Traffic: 1622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6