I recently got some HiC data from GSE65126 (SRA file SRX851861) for mouse liver, mapped to mm10. I am fairly new at handling HiC data so please bear with me if I am not technically correct. I got the fastq files from the sra file & used them as input to HiCUP (uses Bowtie2 for aligning). HiCUP does not run Bowtie2 in paired-end mode, instead it starts two independent Bowtie2 jobs (one for each file) and then pairs reads where both reads map uniquely to the genome. This produces a SAM/BAM file in which paired-reads are on adjacent lines. The output bam looks like this:
SRR1771322.4393742 163 chr1 3003916 42 75M chr6 143502841 0 AACAGGGGTATGTCCCAGACACTGTGTAGCTTCTGCCTGCCCCAGAAGATGTGTCACTTCCTCAGTCTGCTTGTT B@@FFFFF:DHDAFHJAHHGHIFIEHGIGIGGG@FHIGGIJIJIEG;FCGGIJIJIIJIJIIIAHEHGIIHHHC? AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:75 YT:Z:UU CT:Z:TRANS SRR1771322.4393742 83 chr6 143502841 42 32M chr1 3003916 0 AAGCTTCATTTTGTGACTCGGAACACTTTCAG IIIIIIIIIIHF?CA;<F<FHHHHDDDFF@@? AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:32 YT:Z:UU CT:Z:TRANS SRR1771322.178 115 chr6 15801748 42 75M = 15512512 0 ATTTGCCTCATTATCCTGTAAAACTGTTTAACCAAGAGGCTTGTCTTATGCTTGAATATATCTTGCTATGATTTG CGEHAEF=@<<EGCF?<DGIHHGHEHGCIIIHFEF?DCGHECBGHC<FA+FFC:HHAHGHEEHD?HBDD?DD@?? AS:i:-4 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:46G28 YT:Z:UU CT:Z:FAR SRR1771322.178 179 chr6 15512512 42 64M = 15801748 0 AAGCTTATACTAAACAAATCATCCAACAATGCCAACAAGAATATATATATATATTATGTAATAT D?*DHHGGIIGGGIHFCC3EF<HBIIGGHABGHEJJHICEGGGIJJHEGBB?F>BDAEFFF@@@ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:64 YT:Z:UU CT:Z:FAR
So you have cis & trans interactions mapped in the bam file. An '=' sign means that the mate is on the same chromosome i.e. cis. The file also states explicitly if it is a cis- or trans-, far- or close- interaction (last column).
Our lab is specifically interested in a region that is on chr2 and spans ~100kb. We are interested in knowing which regions from the other chromosomes (i.e. TRANS or Inter-chromosomal interactions) interact with that particular region on chr2. I already have the data in the above format where the third column is chr2 and the 7th column is some other chromosome.
So there may be multiple chromosomal positions (say on chr1, chr11, chr12 etc) interacting with this region, but is there a way to find which regions have significantly more reads/coverage compared to all the the regions that interact with this region? I have tried GOTHiC but it takes two input files whereas I only have one. Something simpler, like calculating a coverage or depth would also work.