This question is related to this one, but I would like to know if anyone knows of any methods of quickly extracting reads from a BAM file that overlap with a list of many regions (e.g. a BED file), but I would like the reads separately for each region.
My eventual requirements are that for each region individually, I would like to know:
- The number of overlapping reads
- The location of the highest pileup
- The height of the highest pileup
However I am happy to calculate these, I just need a tool to return the reads for each region in a way that I can parse.
I am currently using Rsamtools, which does everything I require, as it returns a list, with each element in the list containing the reads that overlap each region. However, is really slow for a large number of regions. It's run time is dependent on the number of regions and largely independent of the number of reads in the BAM. I am using Rsamtools like this currently:
which <- RangedData(space=bed[,1],IRanges(bed[,2],bed[,3])) param <- ScanBamParam(which=which,what=c("pos")) bam <- scanBam(file=bamFile,param=param)
I can split the list of regions and run it as many parallel jobs and merge at the end, however first I wanted to check if there was another tool which was quicker for this task.
I have looked at intersectBed, but I can not get this to return reads in a way I can easily parse to give information for each region. The run time for this method is dependent on the number of reads in the BAM and largely independent of the number of regions, which would be good for my needs.