Question: Quick Way to Annotate a Bed File
gravatar for gtasource
11 months ago by
gtasource20 wrote:

I found some questions similiar to this, but the answers do not apply as the species I work with is not readily available. I basically have a bed file with coordinates:

Chr1 0 500
Chr1 501 551
Chr1 552 601

In another file, I have annotations for specific featrues

chr1 0 600 gene1
chr1 601 799 gene2

I want to annoate the first bed file, using information from the second file. I tried using Bedtools Intersect (and some Bedops), but it didn't end up working like I wanted. All help would be much appreciated.

bed • 1.1k views
ADD COMMENTlink modified 11 months ago by colin.kern220 • written 11 months ago by gtasource20

How does bedtools intersect not do what you want? Perhaps it's Chr1 vs chr1?

ADD REPLYlink written 11 months ago by Eric Lim1.4k
gravatar for colin.kern
11 months ago by
United States
colin.kern220 wrote:

bedtools intersect should be the solution, but the chromosome names need to match between the files and in what you posted you have "Chr1" for one and "chr1" for the other, so it won't find any intersections as it sees those as completely separate chromosomes. With that fixed, this command should work:

bedtools intersect -a coords.bed -b genes.bed -wa -wb

The 'wa' argument means it will copy the exact start and end coordinates from the entry from file A, instead of just the coordinates of the segment that overlaps, and the 'wb' will append the entry from file B to the line so you know which gene it is intersecting. You can pipe it to the cut command to remove the gene coordinates if you don't need them:

bedtools intersect -a coords.bed -b genes.bed -wa -wb | cut -f1,2,3,7

This will result in just the entries from file A with the gene name of the overlap added. This could result in duplicate lines from file A showing up if it intersects multiple genes, so you will have to be aware of that for downstream analysis. You could use bedtools merge to convert the result back to one line per original line in file A, with a comma separated list of genes that overlapped (use the "-o distinct" argument).

ADD COMMENTlink modified 11 months ago • written 11 months ago by colin.kern220

Thanks a lot for this!

ADD REPLYlink written 11 months ago by gtasource20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour