Question: Assign number to bed file intervals
0
gravatar for cyril-cros
4.0 years ago by
cyril-cros890
France
cyril-cros890 wrote:

I have a (sorted) bed file which contains exons, like this:

chr1    92479683    92480619    ENSMUST00000086837
chr1    92480616    92480619    ENSMUST00000086837
chr1    92490817    92490820    ENSMUST00000071521
chr1    92490817    92491753    ENSMUST00000071521

How can I do to number those exons in the following style, with the number of the exon?

chr1    92479683    92480619    ENSMUST00000086837    1
chr1    92480616    92480619    ENSMUST00000086837    2 
chr1    92490817    92490820    ENSMUST00000071521    1
chr1    92490817    92491753    ENSMUST00000071521    2

Thanks!

PS: I just wanted to know if there was a quick and efficient way to do this. Otherwise I can always do a few lines of python.

 

bed file • 966 views
ADD COMMENTlink modified 4.0 years ago by Alex Reynolds28k • written 4.0 years ago by cyril-cros890
4

If the transcripts are always together this can be an awk one-liner: awk '{OFS="\t";if($4==last) {cnt+=1}else{cnt=0;last=$4}$5=cnt+1;print}' foo.bed

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Devon Ryan90k

Very nice answer. In my case I have transcripts which are well separated, so it works.

Otherwise, I agree that you would need to separate the transcripts on each strand and check for overlaps anyway.

ADD REPLYlink written 4.0 years ago by cyril-cros890
0
gravatar for Alex Reynolds
4.0 years ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

BEDOPS bedmap --count will count the number of overlaps between sets of genes and exons (or any other pairing of reference and map BED files).

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by Alex Reynolds28k

Yes, but here I use a set of disjoint interval (and not overlapping on opposite strands). There are few genes whose exons overlap another gene on the same strand, anyway (saw it once, it was surprising: OMP and Capn5 in the mouse genome).

ADD REPLYlink written 4.0 years ago by cyril-cros890

You can add overlap criteria to enforce full overlap of an exon with its parent gene with --fraction-map, and/or report overlapping exons with --echo-map and post-process with awk to filter for non-matching IDs.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour