merging overlaping ranges with GenomicRanges
1
0
Entering edit mode
3.6 years ago

Hello everyone,

I have a BLASTN output file in the following format (-outfmt 6):

> Chr1  contig_806  98.438  64  1   0   382055  382118  64  1   7.47e-23    113
> Chr1  contig_806  100.000 61  0   0   381772  381832  121 61  7.47e-23    113
> Chr1  contig_2215 95.367  259 11  1   262655  262913  267 10  2.36e-112   411
> Chr1  contig_2215 100.000 34  0   0   262580  262613  371 338 7.63e-08    63.9
> Chr2  contig_5406 88.875  818 56  10  140031  140845  1525    740 0.0 974
> Chr2  contig_5406 95.660  553 22  1   142155  142705  565 13  0.0 887
> Chr2  contig_5406 92.614  176 13  0   140957  141132  740 565 1.11e-65    254
> Chr3  contig_1474 94.322  317 18  0   26915   27231   513 829 8.15e-136   486
> Chr3  contig_1474 98.148  270 5   0   25794   26063   246 515 2.28e-131   472
> Chr3  contig_1474 95.885  243 10  0   25402   25644   1   243 5.08e-108   394
> Chr3  contig_1474 93.678  174 11  0   27190   27363   1007    834 5.38e-68    261
> Chr3  contig_3199 94.888  489 21  3   102014  102498  1390    902 0.0 761

And I wish to output my data in the following format, using the GenomicRanges package for R:

Chr1 381772 382118 contig_806
Chr1 262580 262913 contig_2215
Chr2 140031 142705 contig_5406
Chr3  25402  27231 contig_1474
Chr3 102014 102498 contig_3199

Meaning that I try to merge the ranges that overlap to obtain the larger range. I have tried to do it with the package GenomicRanges (it is the format I have written in the output). But I don't find the right command to do it.

Has anyone tried this already?

R blast GenomicRanges range • 2.8k views
ADD COMMENT
1
Entering edit mode

Section 4 in the HelloRanges tutorial (merging and aggregating ranges) explains how to do this. You want to use the reduce function with the with.revmap = TRUE argument. In the aggregate function after running reduce you can add an argument such as contig = unique(contig) (replacing 'contig' with whatever the column name is) to have the name column properly added.

ADD REPLY
0
Entering edit mode

Thanks for these references. I have tried to use reduce(), but the result was not exactly what I was searching for.

However, the range() function gave me the extremum of the chromosomes (Chr), and it was the one which provided the desired output.

ADD REPLY
3
Entering edit mode
3.6 years ago
> myranges<-GRanges(seqnames=c("chr1","chr1","chr1","chr1"),ranges=IRanges(start=c(382055,381772,262655,262580),end=c(382118,381832,262913,262913)))
> reduce(myranges)
GRanges object with 3 ranges and 0 metadata columns:
      seqnames        ranges strand
         <Rle>     <IRanges>  <Rle>
  [1]     chr1 262580-262913      *
  [2]     chr1 381772-381832      *
  [3]     chr1 382055-382118      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
ADD COMMENT
0
Entering edit mode

Thanks to have formalized my question in a proper way.

ADD REPLY

Login before adding your answer.

Traffic: 3138 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6