Entering edit mode
3.6 years ago
Begonia_pavonina
▴
150
Hello everyone,
I have a BLASTN output file in the following format (-outfmt 6):
> Chr1 contig_806 98.438 64 1 0 382055 382118 64 1 7.47e-23 113
> Chr1 contig_806 100.000 61 0 0 381772 381832 121 61 7.47e-23 113
> Chr1 contig_2215 95.367 259 11 1 262655 262913 267 10 2.36e-112 411
> Chr1 contig_2215 100.000 34 0 0 262580 262613 371 338 7.63e-08 63.9
> Chr2 contig_5406 88.875 818 56 10 140031 140845 1525 740 0.0 974
> Chr2 contig_5406 95.660 553 22 1 142155 142705 565 13 0.0 887
> Chr2 contig_5406 92.614 176 13 0 140957 141132 740 565 1.11e-65 254
> Chr3 contig_1474 94.322 317 18 0 26915 27231 513 829 8.15e-136 486
> Chr3 contig_1474 98.148 270 5 0 25794 26063 246 515 2.28e-131 472
> Chr3 contig_1474 95.885 243 10 0 25402 25644 1 243 5.08e-108 394
> Chr3 contig_1474 93.678 174 11 0 27190 27363 1007 834 5.38e-68 261
> Chr3 contig_3199 94.888 489 21 3 102014 102498 1390 902 0.0 761
And I wish to output my data in the following format, using the GenomicRanges package for R:
Chr1 381772 382118 contig_806
Chr1 262580 262913 contig_2215
Chr2 140031 142705 contig_5406
Chr3 25402 27231 contig_1474
Chr3 102014 102498 contig_3199
Meaning that I try to merge the ranges that overlap to obtain the larger range. I have tried to do it with the package GenomicRanges (it is the format I have written in the output). But I don't find the right command to do it.
Has anyone tried this already?
Section 4 in the HelloRanges tutorial (merging and aggregating ranges) explains how to do this. You want to use the
reduce
function with thewith.revmap = TRUE
argument. In theaggregate
function after running reduce you can add an argument such ascontig = unique(contig)
(replacing 'contig' with whatever the column name is) to have the name column properly added.Thanks for these references. I have tried to use reduce(), but the result was not exactly what I was searching for.
However, the range() function gave me the extremum of the chromosomes (Chr), and it was the one which provided the desired output.