I am currently using DiffBind 3.2.7 for an analysis on RNA Polymerase III (RNAPIII) occupancy in the human genome. Because of the biology of RNAPIII, in that it is the exclusive polymerase for tRNAs and only very few other sites in the genome, and because of the particular question we are trying to answer, I would like to use a bed file of all annotated human tRNAs as the peak sets for all samples. That is, all samples have the same input file for peaks in the sample sheet. I would also like to retain all of these annotated regions throughout the analysis so that even completely unoccupied tRNA genes form part of the analysis and DESeq results table.
I am having a difficult time right at the first step of loading in my sample data. The bed file of tRNAs I am using (again, the same file for all samples is specified in the sample sheet) contains 560 regions. However, when I load this in with
rpc1.alltRNAs <- dba(sampleSheet="RPC1_samples_alltRNAs.csv") and display
rpc1.alltRNAs, there are only 536 sites in the matrix. I have been unable to prevent this filtering with any of the parameters that I can supply to
dba() such as
minOverlap, peakCaller, peakFormat, filter, bRemoveM, bRemoveRandom.
I have even tried reading in the bam file as a GRanges object and supplying that the the
dba.count() command in the next step to try force all 560 peaks, but that doesn't work either.
I can't find any clear reason why these particular regions are filtered - there are no duplicates and the score for all 560 is exactly the same.
Any help would be much appreciated!
I should add that I have so far been able to restrict filtering imposed by
dba.count() by setting
filter = 0 and
summits = FALSE but the initial sample reading step has already filtered some peaks by this point. The second parameter of disabling summits did in fact influence the number of peaks retained after counting, I'm not really sure why since all of the peaks should have the same summit and same 200bp extensions?