I have a (sorted) bed file which contains exons, like this:
chr1 92479683 92480619 ENSMUST00000086837
chr1 92480616 92480619 ENSMUST00000086837
chr1 92490817 92490820 ENSMUST00000071521
chr1 92490817 92491753 ENSMUST00000071521
How can I do to number those exons in the following style, with the number of the exon?
chr1 92479683 92480619 ENSMUST00000086837 1
chr1 92480616 92480619 ENSMUST00000086837 2
chr1 92490817 92490820 ENSMUST00000071521 1
chr1 92490817 92491753 ENSMUST00000071521 2
Thanks!
PS: I just wanted to know if there was a quick and efficient way to do this. Otherwise I can always do a few lines of python.
If the transcripts are always together this can be an awk one-liner:
Very nice answer. In my case I have transcripts which are well separated, so it works.
Otherwise, I agree that you would need to separate the transcripts on each strand and check for overlaps anyway.