I am new to capture HiC analytics and am having a hard time generating the .baitmap input file for CHiCAGO (https://bioconductor.riken.jp/packages/3.5/bioc/vignettes/Chicago/inst/doc/Chicago.html). I've also read previous posts/questions relating to this but no one else seems to have this problem so I assume the answer is very basic and I just don't know what I'm doing! For some background, we used an Arima-HiC custom RNA probe set and Agilent SureSelect kit.
Essentially, I have what I believe to be bait coordinates from Arima in a file named something like "Arima_pcHiC_v00_1_Covered.bed". The header says "browser position chr1:28142-28261 track name="Covered" description="Agilent SureSelect DNA - Arima_pcHiC_v00_1 - Genomic regions expected to be sequenced" color=0,128,0 visibility=dense db=hg19" I've read this file into R hoping to do a dplyr::inner_join with the restriction digest file that I created with hicup-digester (.rmap file) but I get ZERO matches between these two files. Since I'm not a biologist (and don't really know much about capture HiC) I used pretty much all the coordinates supplied by Arima across all their returned files and none of the coordinates subset the .rmap file. At this point I really don't know what to do, so any help would be greatly appreciated!
Thanks for your response. I actually tried that script already and got errors saying that the files aren't compatible.
I guess my main question is if the bait coordinates should exactly match the start/end positions in the rmap file, or if I just need overlapping ranges? If so can't I just use Bioconductor/IRanges::findOverlaps?
None of that. The file is probably malformatted. It is a BED file, so plain text, tab-separeted, no headers etc. Please check the documentation.