Question: Convert Agi Locus To Genbank Or Embl Format
gravatar for Gvj
8.4 years ago by
Gvj440 wrote:

Hi All, I have a list of AGI locus and want to get their gene structure in genbank or EMBL format. Since TAIR only give in gff3 format, I want a method either to convert gff3 to genbank/embl or a method to get the NCBI acc.No of those AGI locus. I have found one file under TAIR but its not completely true ( or I didn't understand it completely)

gff format conversion genbank • 4.7k views
ADD COMMENTlink modified 8.4 years ago by Ladan0 • written 8.4 years ago by Gvj440
gravatar for Neilfws
8.4 years ago by
Sydney, Australia
Neilfws48k wrote:

The file that you describe contains 2 columns; the second is the TAIR locus tag and the first is the NCBI Entrez Gene database ID. The Gene ID is not the same as an accession number or ID, but it will get you there.

There may well be a file, at the Arabidopsis site or elsewhere, which links Gene ID to GenBank accession. If not, you can use BioMart, something like this:

  1. Click MARTVIEW (top menu)
  2. Choose "EMSEMBL PLANT 6 (EBI UK)" as database
  3. Choose "Arabidopsis thaliana genes (TAIR9)" as dataset
  4. Click "Filters" (left menu); expand GENE; check ID list limit and choose "Entrez Gene ID(s)"
  5. Either paste or upload Gene IDs (column 1 in your file)
  6. Click "Attributes" (left menu); expand EXTERNAL; check "RefSeq DNA ID"
  7. Click "Results" (top left menu)

After some time, this should return results that you can download as plain ASCII text. For example, using Gene ID 2745418 (AT2G01175), I get back "NM_201659".

You can now take your new list of accessions off to Batch Entrez, upload them and retrieve the results in GenBank format.

This is just one solution (relying on both BioMart and Batch Entrez working well); there are plenty of other potential ways to convert between IDs, including programmatic methods.

ADD COMMENTlink written 8.4 years ago by Neilfws48k

Thank you very much.. something I found strange is that NCBI entry only contain CDS not intron/exon information (eg:AT2G32460 (accNo:NM_128805) has 3 exons but not mentioned in .gb format). Why it is so? Is this because I am downloading from nucleotide database? I want all features of genes. Any suggestions ??

ADD REPLYlink written 8.4 years ago by Gvj440

That is strange. Looks like the record has the complete (i.e. includes non-coding) mRNA, but no "parts". I am not sure why.

ADD REPLYlink written 8.4 years ago by Neilfws48k

I guess its not mandatory to have exons,UTR .. features in genbank formate. That would be a reason. Nice to know the BioMart way, but I think programmatic way of converting gff to genbank is the only solution for me

ADD REPLYlink written 8.4 years ago by Gvj440
gravatar for Lars Juhl Jensen
8.4 years ago by
Copenhagen, Denmark
Lars Juhl Jensen11k wrote:

To make your question a bit more general, what you are asking for is a way to make a Genbank (or EMBL) file based on a GFF file and its associated FASTA sequence file. Solutions to that can be found here:

ADD COMMENTlink written 8.4 years ago by Lars Juhl Jensen11k

That script only convert gff3 which doesn't specify UTRs explicitly.

ADD REPLYlink written 8.4 years ago by Gvj440
gravatar for Ladan
8.4 years ago by
Ladan0 wrote:

Dear , How can I find the GeneID or locus_tag of genes? Sincerely yours laleh and Ladan

ADD COMMENTlink written 8.4 years ago by Ladan0

could you please specify bit more about your question?

ADD REPLYlink written 8.3 years ago by Gvj440
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1035 users visited in the last hour