get last exon form GTF
1
0
Entering edit mode
3.6 years ago
fuhaolll2 ▴ 30

how could i extract each transcript's last exon from GTF ,get a new GTF only contain last exon.

RNA-Seq • 1.2k views
ADD COMMENT
0
Entering edit mode

What have you tried? awk? R?

ADD REPLY
0
Entering edit mode

for first exon,i know exon number is 1,can get it .but last exon number maybe 6,7,8.

ADD REPLY
1
Entering edit mode

not exactly an answer to your question but I would not trust the numbering of the exons. Sometimes people will start numbering the "last" exon as 1 if the gene is on the reverse strand for instance.

just a little warning ;)

ADD REPLY
0
Entering edit mode
3.6 years ago

I would script this sort of thing. I'm sure there is an R way to do it, but I tend to use python for this sort of thing. There might be a clever way to do it in awk, but I don't know it.

If you are happy with installing things, cgatapps has code for handling GTF files. This sort of thing could be achieved with:

from cgat import GTF
from cgatcore import iotools

input_file = iotools.open_file("my_transcripts.gtf.gz")

gtf_entries = gtf.iterator(input_file)

transcripts = gtf.transcript_iterator(gtf_enteries)

for transcript in transcripts:
    exons = [e for e in transcript if e.feature == "exon"]

    if len(exons) == 0:
        continue

    if transcript[0].strand == "-":
        exons = sorted(exons, key=lambda x: -1*x.end)
    else:
        exons = sorted(exons, key=lambda x: x.start

    last_exon = exons[-1]

    print (str(last_exon) + "\n")

The same thing can be achieved with pysam with a little more effort if cgat is too big an installation.

ADD COMMENT

Login before adding your answer.

Traffic: 1306 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6