Mapping reads from a BAM file to a custom BED file.
Entering edit mode
2.3 years ago


I have a BED file that was engendered with the following:

bedtools makewindows -g ../bedtools2/genomes/human.hg19.genome -w 2000 > hg19_2K_bins.bed

The goal is to map reads from a BAM file to the intervals as defined, to visualize the distribution of the counts pan-genome, bin-wise. Now, I am aware of the bamCoverage tool from deeptools, but the incorrigible issue is that it merges adjacent bins if the count number overlaps.

bamCoverage --bam testMe.bam \
            -o \
            --binSize 2000 \
            --normalizeUsing None  \
            --effectiveGenomeSize 2913022398 \ # hg19 version of Homo sapiens
            --outFileFormat bedgraph \
            --maxFragmentLength 30

The output I desire is something like:

Chrom Start End Score
chr1 0 2000 34
chr1 2000 4000 46

where the values in the last column (Score) are from our BAM file. I have two questions, basically.

  1. Is there an alternative tool for this or a workaround?
  2. What if we have a bedgraph/ bw file with scores instead of a BAM?

Please advise. Thanks.

ChIP-Seq BAM file BED file • 674 views
Entering edit mode

Why is it a problem for you if adjacent bins are merged when they have the same score? Why not just post-process a bedGraph file if that's really an issue?

Entering edit mode

I wouldn't say it is a problem, but the layout of the output file I desire is such- scores in homogeneous bins. Secondly, a post-process of the bedGraph file is surely doable. I just want to know if there are any tools already that can help achieve that. In R, I am trying looping over all the lines of source(output from bamCoverage) and target(fixed bins) files, but it seems too computationally expensive.

Entering edit mode

To illustrate my point, would something like this be advisable. The "targetBED" is the file for specific sized genomic regions, while "sourceBED" has heterogeneous regions with a score.

mappingReadsBins <- function(targetBED, sourceBED)
      sourceBED$Chrom <- factor(sourceBED$Chrom, levels=levels(targetBED$Chrom)) # match chromosomes for consistency 
      # names in random binned file to the fixed bins file.
          for(i in 1:nrow(sourceBED))
            for(j in 1:nrow(targetBED))
                if(sourceBED$Chrom[i] == targetBED$Chrom[j] && sourceBED$Start[i] <= targetBED$Start[j] && sourceBED$End[i] >= targetBED$End[j])
                  targetBED$Score[j] <- sourceBED$Score[i]

Login before adding your answer.

Traffic: 1925 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6