How to extract Locus tag from GeneIDs in NCBI for soybean database.
2
0
Entering edit mode
2.8 years ago
b.g.tamang ▴ 20

Hi all, I have been trying to get information on converting NCBI GeneID to Glyma ID for soybean gene annotation. However it seems like such file does not exist. For instance, GeneID:100790502 search in NCBI shows Locus tag for this id as GLYMA_09G197400 which is what I want to extract but for all 56K soybean genes. Is there a way to query all 56K NCBI format GeneIDs and extract the Locus tag value? That way, I can use Glyma IDs and annotate them using phytozome annotation file. Phytozome already has their genes in Glyma format and I am not sure why NCBI does not have this information in their fasta or gff/gtf files.

Your insight is much appreciated.

Best,

RNASeq Soybean tag Locus ID Glyma • 1.6k views
ADD COMMENT
1
Entering edit mode
2.8 years ago
GenoMax 141k

You can use EntrezDirect. First column as Entrez gene ID and second Locus tag :

$ esearch -db gene -query 100790502  | esummary | xtract -pattern DocumentSummary -element Id,OtherAliases
100790502   GLYMA_09G197400

This may get you most of them. Showing only 10 here (remove | head -10 to get them all) :

$ esearch -db gene -query GLYMA | esummary | xtract -pattern DocumentSummary -element Id,OtherAliases | head -10
547923  GLYMA_13G347600, L-1, Lx1
548076  GLYMA_13G288100
547831  GLYMA_08G341500, KTi, Ti-a, Ti-b, Tia, Tic, Tie
100788438   GLYMA_03G181700, GmPAL1.2, PAL1
547900  GLYMA_03G163500, A2B1a, glycinin
547641  GLYMA_18G023500, RLK-RHG1, rhg1-like, rhg1g, rhg1s
547931  GLYMA_06G301500, BMY1, Gm-BamyDam, Gm-BamyKza
100787872   GLYMA_02G309300, GmPAL3.1
100527427   GLYMA_10G199100, GmLb, N-2, Nodulin-2
547869  GLYMA_15G026300, L-3, LOX1.3, Lx3
ADD COMMENT
0
Entering edit mode

This looks like what I needed. I will try this. Thanks a lot @genomax. I really appreciate it. Best.

ADD REPLY
1
Entering edit mode
2.8 years ago
vkkodali_ncbi ★ 3.7k

The file gene_info.gz on the Gene FTP site has this information. Since you are interested in soybean only, you can download the All_Plants.gene_info.gz file from here. On a Unix command line, you can extract these as follows:

zcat All_Plants.gene_info.gz | awk 'BEGIN{FS="\t";OFS="\t"}(($1~/^#/||$1==3847) && $4~/GLYMA/)'
ADD COMMENT
0
Entering edit mode

This is great information and saved me a lot of headache. Thanks a lot. Appreciate it. Best.

ADD REPLY

Login before adding your answer.

Traffic: 2331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6