Question

Grange object for hi-c

0

Entering edit mode

4.2 years ago

Siegfried ▴ 10

I would like to ask how to establish a grange object with double seqnames columns as well as start and end columns for strore interaction data?

R genome • 923 views

ADD COMMENT • link updated 4.2 years ago by benformatics 3.9k • written 4.2 years ago by Siegfried ▴ 10

0

Entering edit mode

Please be more specific by giving example data and more context. What files do you have and how is the output supposed to look like?

ADD REPLY • link 4.2 years ago by ATpoint 82k

score 0 · Answer 1 · 2020-03-04

Assuming you are using R then you could use GenomicAlignments.

Example:

## import your reads
gr <- readGAlignmentPairs('your_hic_reads.bam')

## look at them
head(gr)
GAlignmentPairs object with 99420 pairs, strandMode=1, and 0 metadata columns:
            seqnames strand   :      ranges  --            ranges
               <Rle>  <Rle>   :   <IRanges>  --         <IRanges>
        [1]    chr2      *   :   5035-5184  --       32376-32457
        [2]     <NA>      *   :   5089-5122  --   2700963-2701112

## here you see that seqnames can either be a single value or <NA>. If it's a single value then the two reads are on the same chromosome, if it's NA then they are on different chromosomes

## let's subset and look at the different chrom
granges(gr[2],on.discordant.seqnames='split')
  [1]    chr1       55089-5122       +
  [2]    chr2       2700963-2701112       -

At this point you have would need to split the reads manually. The GRanges object cannot support two seqnames side-by-side (i.e. in "columns"), you could however convert each pairs of reads into a GRanges [see below] object and then merge all reads into a giant GRangesList. For the seqnames() call to return the two different chromosomes, you need to convert the alignment object to GRanges with on.discordant.seqnames='split'. Then we build a final data.frame to your specifications [note we no longer have GRanges]. Here is an example but it's not IMO the best way to do it - so I won't include suggest it.

## pull example discordant read
b <- gr[2]

## duplicate the read and then split them the two reads (R1 and R2)
bb <- granges(c(b,b),on.discordant.seqnames='split')

## remember which read is which
c('read1','read2') -> bb$read

## merge into giant data.frame
b.df <- cbind(data.frame(subset(bb,read == 'read1')),data.frame(subset(bb,read == 'read2')))

## check it
head(b.df)
  seqnames start  end width strand  read seqnames   start     end width strand
1    chr1  5089 5122    34      + read1    chr2 2700963 2701112   150      -
2    chr1  5089 5122    34      + read1    chr2 2700963 2701112   150      -
   read
1 read2
2 read2