Convert Agi Locus To Genbank Or Embl Format
3
0
Entering edit mode
13.5 years ago
Gvj ▴ 470

Hi All, I have a list of AGI locus and want to get their gene structure in genbank or EMBL format. Since TAIR only give in gff3 format, I want a method either to convert gff3 to genbank/embl or a method to get the NCBI acc.No of those AGI locus. I have found one file ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR9_genome_release/TAIR9_NCBI_GENEID_mapping under TAIR but its not completely true ( or I didn't understand it completely)

format conversion genbank gff • 7.2k views
ADD COMMENT
2
Entering edit mode
13.5 years ago
Neilfws 49k

The file that you describe contains 2 columns; the second is the TAIR locus tag and the first is the NCBI Entrez Gene database ID. The Gene ID is not the same as an accession number or ID, but it will get you there.

There may well be a file, at the Arabidopsis site or elsewhere, which links Gene ID to GenBank accession. If not, you can use BioMart, something like this:

  1. Click MARTVIEW (top menu)
  2. Choose "EMSEMBL PLANT 6 (EBI UK)" as database
  3. Choose "Arabidopsis thaliana genes (TAIR9)" as dataset
  4. Click "Filters" (left menu); expand GENE; check ID list limit and choose "Entrez Gene ID(s)"
  5. Either paste or upload Gene IDs (column 1 in your file)
  6. Click "Attributes" (left menu); expand EXTERNAL; check "RefSeq DNA ID"
  7. Click "Results" (top left menu)

After some time, this should return results that you can download as plain ASCII text. For example, using Gene ID 2745418 (AT2G01175), I get back "NM_201659".

You can now take your new list of accessions off to Batch Entrez, upload them and retrieve the results in GenBank format.

This is just one solution (relying on both BioMart and Batch Entrez working well); there are plenty of other potential ways to convert between IDs, including programmatic methods.

ADD COMMENT
0
Entering edit mode

Thank you very much.. something I found strange is that NCBI entry only contain CDS not intron/exon information (eg:AT2G32460 (accNo:NM_128805) has 3 exons but not mentioned in .gb format). Why it is so? Is this because I am downloading from nucleotide database? I want all features of genes. Any suggestions ??

ADD REPLY
0
Entering edit mode

That is strange. Looks like the record has the complete (i.e. includes non-coding) mRNA, but no "parts". I am not sure why.

ADD REPLY
0
Entering edit mode

I guess its not mandatory to have exons,UTR .. features in genbank formate. That would be a reason. Nice to know the BioMart way, but I think programmatic way of converting gff to genbank is the only solution for me

ADD REPLY
0
Entering edit mode
13.5 years ago

To make your question a bit more general, what you are asking for is a way to make a Genbank (or EMBL) file based on a GFF file and its associated FASTA sequence file. Solutions to that can be found here

ADD COMMENT
0
Entering edit mode

That script only convert gff3 which doesn't specify UTRs explicitly.

ADD REPLY
0
Entering edit mode
13.4 years ago
Ladan • 0

Dear , How can I find the GeneID or locus_tag of genes? Sincerely yours laleh and Ladan

ADD COMMENT
0
Entering edit mode

could you please specify bit more about your question?

ADD REPLY

Login before adding your answer.

Traffic: 2164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6