Question: Identify transcripts code for longest protein from gene annotation file
1
gravatar for waqaskhokhar999
7 months ago by
waqaskhokhar99980 wrote:

I have reference annotation file of Arabidopsis thaliana and I am interested to identify extract transcipts that code for longest protein isoform and then extract coodinates of that transcript. Forexample gene (AT1G01020) contain 6 transcripts (AT1G01020.1, AT1G01020.2, AT1G01020.3, AT1G01020.4, AT1G01020.5, AT1G01020.6), how can i identify transcript which codes for longest protein and extract its coordinates?

The reference annotation file

Does it depends on number of exons, CDS regions or length of exons?

ADD COMMENTlink modified 7 months ago by JC9.6k • written 7 months ago by waqaskhokhar99980
0
gravatar for JC
7 months ago by
JC9.6k
Mexico
JC9.6k wrote:

Use Arabidopsis in BioMart to filter by "Gene stable ID" for your gene, select the "Structures" in "Attributes" and retrieve the values you need.

ADD COMMENTlink written 7 months ago by JC9.6k

I am amble to select the protein coding transcripts but how I can select the transcrip that codes for longest protein? Seondly if multiple transcipts of variable length code for protein of similar length then which transcript should I need to select? For example gene (AT2G27490) conatin 4 transcripts of variable length but all codes for protein of 232aa so which one I need to select?

ADD REPLYlink written 7 months ago by waqaskhokhar99980

You select the larger one from the table, if you need to automatically decide, then you need to code something to query and filter your selection. Deciding which one to use if they have the same length, that is a question you need to define based on what are you trying to do with that information.

ADD REPLYlink written 7 months ago by JC9.6k

Longest transcript doesn't mean it codes for longest protein as it can aslo contain retained introns or part of introns, how can i get the idea of longest protein coding transcript?

ADD REPLYlink written 7 months ago by waqaskhokhar99980

by CDS (CoDing Sequence) length

ADD REPLYlink written 7 months ago by JC9.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2264 users visited in the last hour