Mapping reads from a BAM file to a custom BED file.
0
0
Entering edit mode
19 months ago

Hi.

I have a BED file that was engendered with the following:

bedtools makewindows -g ../bedtools2/genomes/human.hg19.genome -w 2000 > hg19_2K_bins.bed


The goal is to map reads from a BAM file to the intervals as defined, to visualize the distribution of the counts pan-genome, bin-wise. Now, I am aware of the bamCoverage tool from deeptools, but the incorrigible issue is that it merges adjacent bins if the count number overlaps.

bamCoverage --bam testMe.bam \
-o testMe_2k.bw \
--binSize 2000 \
--normalizeUsing None  \
--effectiveGenomeSize 2913022398 \ # hg19 version of Homo sapiens
--outFileFormat bedgraph \
--maxFragmentLength 30


The output I desire is something like:

Chrom Start End Score
chr1 0 2000 34
chr1 2000 4000 46
...


where the values in the last column (Score) are from our BAM file. I have two questions, basically.

1. Is there an alternative tool for this or a workaround?
2. What if we have a bedgraph/ bw file with scores instead of a BAM?

ChIP-Seq BAM file BED file • 482 views
1
Entering edit mode

Why is it a problem for you if adjacent bins are merged when they have the same score? Why not just post-process a bedGraph file if that's really an issue?

0
Entering edit mode

I wouldn't say it is a problem, but the layout of the output file I desire is such- scores in homogeneous bins. Secondly, a post-process of the bedGraph file is surely doable. I just want to know if there are any tools already that can help achieve that. In R, I am trying looping over all the lines of source(output from bamCoverage) and target(fixed bins) files, but it seems too computationally expensive.

0
Entering edit mode

To illustrate my point, would something like this be advisable. The "targetBED" is the file for specific sized genomic regions, while "sourceBED" has heterogeneous regions with a score.

mappingReadsBins <- function(targetBED, sourceBED)
{
sourceBED$Chrom <- factor(sourceBED$Chrom, levels=levels(targetBED$Chrom)) # match chromosomes for consistency # names in random binned file to the fixed bins file. for(i in 1:nrow(sourceBED)) { for(j in 1:nrow(targetBED)) { if(sourceBED$Chrom[i] == targetBED$Chrom[j] && sourceBED$Start[i] <= targetBED$Start[j] && sourceBED$End[i] >= targetBED$End[j]) { targetBED$Score[j] <- sourceBED\$Score[i]
}
}
}
return(targetBED)
}