I have a small set of bam files from centromere regions of S. cerevisiae. Those I have extracted from the complete bam files using the samtools view option based on a list of centromere positions as shown below. I have used the centromere middle point to go forward and reverse 30kb and extract the regions.
chromosome CEN-[30kb] CENmitte CEN+[30kb] I 121500 151500 181500 II 208300 238300 268300 III 84500 114500 144500 IV 419700 449700 479700 ...
I would like to see how many reads are mapped to each of the positions in these regions. This i would like to count into 50b bins of genome regions. For that I have used the tileGenome command to create a virtual GRanges object as shown below (this is why it has no actual information yet).
bins1 <- tileGenome(seqinfo(Scerevisiae), tilewidth=500, cut.last.tile.in.chrom=TRUE)
GRanges object with 24322 ranges and 0 metadata columns: seqnames ranges strand <Rle> <IRanges> <Rle>  chrI [ 1, 500] *  chrI [ 501, 1000] *  chrI [1001, 1500] *  chrI [1501, 2000] *  chrI [2001, 2500] * ... ... ... ...  chrM [83501, 84000] *  chrM [84001, 84500] *  chrM [84501, 85000] *  chrM [85001, 85500] *  chrM [85501, 85779] * ------- seqinfo: 17 sequences (1 circular) from sacCer3 genome
using the summarizeOverlaps command I can now summarize exactly how many reads are mapped into each of my bins in the genome
bamFiles <- list.files(path = ~/centromereBamFIles.withchrIII/", pattern = ".bam$", full.names = TRUE) olapTable <- summarizeOverlaps(bins1, bamFiles, inter.feature=FALSE)
But the problem is that I still have the complete genome in the bins1 GRanges object. But I would like it to contain only the regions I am using for the centromeres.
Hence my question, is there a way to extract multiple regions from the GRanges object? I would like to extract the centromere regions (± a few 20kb) of each chromosome. e.g. these regions:
chrI:101500-201500 chrII:188300-288300 chrIII:64500-164500
But I can't find a way to do it?
I know I can remove all the rows with a sum of 0, but with that I will also remove the rows from the centromere regions where no reads were mapped. These I need to keep in my GRanges object.. So I will try to work it out with restrict or findOverlaps before I run the summarizeOverlaps with the bam files
Is there a more direct way to do it?