Genomic coordinates from GI accession numbers
2
0
Entering edit mode
9.7 years ago

I have a list of GI accession numbers that I obtained following a protein BLAST search. These are from different bacterial genomes. I've pasted some examples below.

108765232
108769511
108769512
108769513
108769516
108799092
108799093
108799094
108799097
108803989

For further analysis, I would like to get the genomic coordinates of each of them in their corresponding genomes. Is there a way to do this via NCBI E-utils or by any other means?

I'm currently trying out the following approach:

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=108769513&rettype=gp" | grep coded_by=

giving:

                 /coded_by="CP000384.1:2273218..2274069"

which I'm processing to use 2273218 and 2274069 as the genomic coordinates.

Let me know if this approach is right, and/or if there's a better way to do this.

Thank you.

genome blast identifiers • 2.6k views
ADD COMMENT
1
Entering edit mode
9.7 years ago
5heikki 11k

With Entrez Direct:

esearch -query 108765232 -db protein | elink -target gene | efetch -format documentsummary

1. Rxyl_1148
XRE family transcriptional regulator[Rubrobacter xylanophilus DSM 9941]
Other Aliases: Rxyl_1148
Annotation: NC_008148.1 (1176826..1177125)
ID: 4117380

Also check the documentation, "Sending results to scripts or spreadsheets" onwards is especially relevant.

ADD COMMENT
1
Entering edit mode
9.7 years ago
hugorody ▴ 20

I think the better way is downloading the GFF file of each genome and using the command "grep" from bash to capture the coordinates of each gene ID.

you could use:

#!/bin/bash
cat list_of_genesID.txt | while read line
do
y="$line"
x=`cat file.gff | fgrep "$y" | head -1`
echo "$x"
done > output.txt
ADD COMMENT

Login before adding your answer.

Traffic: 2912 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6