Assign number to bed file intervals
1
0
Entering edit mode
8.9 years ago
cyril-cros ▴ 950

I have a (sorted) bed file which contains exons, like this:

chr1    92479683    92480619    ENSMUST00000086837
chr1    92480616    92480619    ENSMUST00000086837
chr1    92490817    92490820    ENSMUST00000071521
chr1    92490817    92491753    ENSMUST00000071521

How can I do to number those exons in the following style, with the number of the exon?

chr1    92479683    92480619    ENSMUST00000086837    1
chr1    92480616    92480619    ENSMUST00000086837    2 
chr1    92490817    92490820    ENSMUST00000071521    1
chr1    92490817    92491753    ENSMUST00000071521    2

Thanks!

PS: I just wanted to know if there was a quick and efficient way to do this. Otherwise I can always do a few lines of python.

bed • 1.8k views
ADD COMMENT
4
Entering edit mode

If the transcripts are always together this can be an awk one-liner:

awk '{OFS="\t";if($4==last) {cnt+=1}else{cnt=0;last=$4}$5=cnt+1;print}' foo.bed
ADD REPLY
0
Entering edit mode

Very nice answer. In my case I have transcripts which are well separated, so it works.

Otherwise, I agree that you would need to separate the transcripts on each strand and check for overlaps anyway.

ADD REPLY
0
Entering edit mode
8.9 years ago

BEDOPS bedmap --count will count the number of overlaps between sets of genes and exons (or any other pairing of reference and map BED files).

ADD COMMENT
0
Entering edit mode

Yes, but here I use a set of disjoint interval (and not overlapping on opposite strands). There are few genes whose exons overlap another gene on the same strand, anyway (saw it once, it was surprising: OMP and Capn5 in the mouse genome).

ADD REPLY
0
Entering edit mode

You can add overlap criteria to enforce full overlap of an exon with its parent gene with --fraction-map, and/or report overlapping exons with --echo-map and post-process with awk to filter for non-matching IDs.

ADD REPLY

Login before adding your answer.

Traffic: 2613 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6