Question

Mapping reads from a BAM file to a custom BED file.

0

Entering edit mode

4.1 years ago

Shaurya Jauhari ▴ 50

Hi.

I have a BED file that was engendered with the following:

bedtools makewindows -g ../bedtools2/genomes/human.hg19.genome -w 2000 > hg19_2K_bins.bed

The goal is to map reads from a BAM file to the intervals as defined, to visualize the distribution of the counts pan-genome, bin-wise. Now, I am aware of the bamCoverage tool from deeptools, but the incorrigible issue is that it merges adjacent bins if the count number overlaps.

bamCoverage --bam testMe.bam \
            -o testMe_2k.bw \
            --binSize 2000 \
            --normalizeUsing None  \
            --effectiveGenomeSize 2913022398 \ # hg19 version of Homo sapiens
            --outFileFormat bedgraph \
            --maxFragmentLength 30

The output I desire is something like:

Chrom Start End Score
chr1 0 2000 34
chr1 2000 4000 46
...

where the values in the last column (Score) are from our BAM file. I have two questions, basically.

Is there an alternative tool for this or a workaround?
What if we have a bedgraph/ bw file with scores instead of a BAM?

Please advise. Thanks.

ChIP-Seq BAM BED • 1.1k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 4.1 years ago by Shaurya Jauhari ▴ 50

1

Entering edit mode

Why is it a problem for you if adjacent bins are merged when they have the same score? Why not just post-process a bedGraph file if that's really an issue?

ADD REPLY • link 4.1 years ago by Devon Ryan 104k

0

Entering edit mode

I wouldn't say it is a problem, but the layout of the output file I desire is such- scores in homogeneous bins. Secondly, a post-process of the bedGraph file is surely doable. I just want to know if there are any tools already that can help achieve that. In R, I am trying looping over all the lines of source(output from bamCoverage) and target(fixed bins) files, but it seems too computationally expensive.

ADD REPLY • link 4.1 years ago by Shaurya Jauhari ▴ 50

0

Entering edit mode

To illustrate my point, would something like this be advisable. The "targetBED" is the file for specific sized genomic regions, while "sourceBED" has heterogeneous regions with a score.

mappingReadsBins <- function(targetBED, sourceBED)
    {
      sourceBED$Chrom <- factor(sourceBED$Chrom, levels=levels(targetBED$Chrom)) # match chromosomes for consistency 
      # names in random binned file to the fixed bins file.
          for(i in 1:nrow(sourceBED))
          {
            for(j in 1:nrow(targetBED))
              {
                if(sourceBED$Chrom[i] == targetBED$Chrom[j] && sourceBED$Start[i] <= targetBED$Start[j] && sourceBED$End[i] >= targetBED$End[j])
                {
                  targetBED$Score[j] <- sourceBED$Score[i]
                }
              }
          }
      return(targetBED)
          }

ADD REPLY • link 4.1 years ago by Shaurya Jauhari ▴ 50