Question: Sliding Window And Average _ A Tool
1
gravatar for kanwarjag
7.6 years ago by
kanwarjag1.0k
United States
kanwarjag1.0k wrote:

I am looking for a tool which may allow me to divide genome into a sliding window and give me a average count of seq reads. Any suggestion please

bioinformatics • 7.7k views
ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by kanwarjag1.0k

Can you explain what you mean by "divide genome into a sliding window"? Have you mapped reads onto a reference sequence?

ADD REPLYlink written 7.6 years ago by Josh Herr5.7k

I have aligned reads to a bacterial refrence genome. I divided genome in bins of 10bp and counted number of reads in each bin. Normalized by total reads. Now for thes bins, I want to generate a sliding window of 100bp and count average of reads in each window and assign to mid point of window. I have all in bed file

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by kanwarjag1.0k

I think R will help you with this business. Once you get in R your 10bp bins counts, you can change binwidth to get density with 100bp bins. Here is tutorial for hist function: http://msenux.redwoods.edu/math/R/hist.php. Further you can craft the BED output, sorry, if I am not getting your question.

ADD REPLYlink written 7.6 years ago by Pavel Senin1.9k
4
gravatar for Alex Reynolds
7.6 years ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

If your sequencing data are in BAM format, the BEDOPS site includes a usage case (with code) where the bedmap utility is used with its --count operator to count reads across a sliding window moved across the genome, along with the bam2bed script to convert the BAM data into BED format.

I looked over the code and you could replace the --count operator with the --mean operator to get the arithmetic mean or average reads across said windows. Everything else should be the same (except for the window and step size parameters, which you would likely adjust to your specific use case).

If your data are already in BED format, then you can still use this script, but simply excise the bam2bed conversion step, which would be unnecessary, and swap out the --count operator for the --mean operator, as described above.

You could also excise the starch compression step, which creates a highly-compressed BED file, which is probably unnecessary unless you want to use this format for archiving BED data more efficiently than naïve gzip or bzip2.

ADD COMMENTlink modified 7.3 years ago • written 7.6 years ago by Alex Reynolds30k
2
gravatar for Ido Tamir
7.6 years ago by
Ido Tamir5.1k
Austria
Ido Tamir5.1k wrote:

I have extracted some of the R functions I use in a script to calculate an smoothened average, maybe they still work together.

library("GenomicRanges")

lengthsFromStarts <- function(starts){
        diff(c(1,starts))
}

meanForChromosome <- function(starts, values, window){
        lengths <- lengthsFromStarts(starts)
        rle <- Rle(values, lengths)
        mean <- runmean(rle, window + 1, endrule =  "constant")
        as.data.frame(mean)[,1]
}

downSample <- function(chr, meanVec, sampleWindow){
        half <- round(sampleWindow/2)
        sampling <- seq(half,length(meanVec),by=sampleWindow)
        sample2kValues <- meanVec[sampling]
        sample2kPositions <- (1:length(meanVec))[sampling]
        sampleDF <- data.frame(chr=chr, start=sample2kPositions - half, end=sample2kPositions+(half - 1), values=sample2kValues)
        sampleDF
}

writeWig <- function(wigDF, outfile, outname){ 
        f <- file(outfile, "w")
        writeLines(paste("track type=bedGraph name=\"",outname,"\" visibility=full",sep=""),f )
        write.table(format(wigDF, digits=4, scientific=FALSE), f, sep="\t",col.name=FALSE,row.names=FALSE,quote=FALSE)
        close(f)
}

makeMovingAverage <- function(chrs, starts, normedCounts){
        wigDF <- data.frame()
        for(chr in unique(chrs)){
                print(paste("meaning: ", chr))
                chrData <- chrs == chr
                wig <- meanForChromosome(starts[chrData], normedCounts[chrData], window)
                wigSub <- downSample(chr, wig, 2000)
                wigDF <- rbind(wigDF, wigSub)
        }
        writeWig(wigDF, wigname, wigname)
}
ADD COMMENTlink written 7.6 years ago by Ido Tamir5.1k

Could you please show a small sample dataset for this code? I had some issue to understand it. Thank you very much!

ADD REPLYlink written 3.9 years ago by shamaoe_zheng0
1
gravatar for Irsan
7.6 years ago by
Irsan7.2k
Amsterdam
Irsan7.2k wrote:

If you can get your data in bed format you can use bedtools makewindows to define the genomic ranges that you require and then bedtools map to summarize (mean/median/min/max...) your seq counts on the genomic ranges.

ADD COMMENTlink modified 7.6 years ago • written 7.6 years ago by Irsan7.2k
0
gravatar for kanwarjag
7.6 years ago by
kanwarjag1.0k
United States
kanwarjag1.0k wrote:

Thanks that is of great help. This is my first post on this forum. Great help!!!!

ADD COMMENTlink written 7.6 years ago by kanwarjag1.0k
3

You don't say thanks in an answer. you a) vote/accept b) say thanks in an comment

ADD REPLYlink modified 7.6 years ago • written 7.6 years ago by Ido Tamir5.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 591 users visited in the last hour