Question

Assign number to bed file intervals

0

Entering edit mode

8.9 years ago

cyril-cros ▴ 950

I have a (sorted) bed file which contains exons, like this:

chr1    92479683    92480619    ENSMUST00000086837
chr1    92480616    92480619    ENSMUST00000086837
chr1    92490817    92490820    ENSMUST00000071521
chr1    92490817    92491753    ENSMUST00000071521

How can I do to number those exons in the following style, with the number of the exon?

chr1    92479683    92480619    ENSMUST00000086837    1
chr1    92480616    92480619    ENSMUST00000086837    2 
chr1    92490817    92490820    ENSMUST00000071521    1
chr1    92490817    92491753    ENSMUST00000071521    2

Thanks!

PS: I just wanted to know if there was a quick and efficient way to do this. Otherwise I can always do a few lines of python.

bed • 1.8k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.9 years ago by cyril-cros ▴ 950

4

Entering edit mode

If the transcripts are always together this can be an awk one-liner:

awk '{OFS="\t";if($4==last) {cnt+=1}else{cnt=0;last=$4}$5=cnt+1;print}' foo.bed

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by Devon Ryan 104k

0

Entering edit mode

Very nice answer. In my case I have transcripts which are well separated, so it works.

Otherwise, I agree that you would need to separate the transcripts on each strand and check for overlaps anyway.

ADD REPLY • link updated 16 months ago by Ram 43k • written 8.9 years ago by cyril-cros ▴ 950

Ram · Answer 1 · 2015-06-10

0

Entering edit mode

8.9 years ago

Alex Reynolds 35k

BEDOPS bedmap --count will count the number of overlaps between sets of genes and exons (or any other pairing of reference and map BED files).

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.9 years ago by Alex Reynolds 35k

0

Entering edit mode

Yes, but here I use a set of disjoint interval (and not overlapping on opposite strands). There are few genes whose exons overlap another gene on the same strand, anyway (saw it once, it was surprising: OMP and Capn5 in the mouse genome).

ADD REPLY • link 8.9 years ago by cyril-cros ▴ 950

0

Entering edit mode

You can add overlap criteria to enforce full overlap of an exon with its parent gene with --fraction-map, and/or report overlapping exons with --echo-map and post-process with awk to filter for non-matching IDs.

ADD REPLY • link 8.9 years ago by Alex Reynolds 35k