I want to annotate gene names for a segmented file by finding overlap with it with a reference file. Would like to know how can it be done using GenomicRanges Package or any other bioconductor packages.
Segmented_file
Chromosome Start End
1 5930000 11730000
1 16850000 18010000
reference_file
Chr Start End Gene
1 5930500 6230500 SPSB1
1 6930500 7340500 SPSB2
1 16854500 16950000 TAS1R1
1 17810032 17910064 ENO1
Expected results
Chromosome Start End Gene
1 5930000 11730000 SPSB1,SPSB2
1 16850000 18010000 TAS1R1,ENO1
If you can use the command line, you can do this in a simple one-liner with BEDOPS bedmap:
The file
answer.bed
contains results in the format you expect, except that gene names are delimited with semi-colons. You can change this to commas, if you like, by specifying the--multidelim
option.Just make sure first that
segmented_file
andreference_file
are sorted BED files:Etc.