Question

Genomic coordinates from GI accession numbers

0

Entering edit mode

9.7 years ago

Arjun Krishnan ▴ 40

I have a list of GI accession numbers that I obtained following a protein BLAST search. These are from different bacterial genomes. I've pasted some examples below.

For further analysis, I would like to get the genomic coordinates of each of them in their corresponding genomes. Is there a way to do this via NCBI E-utils or by any other means?

I'm currently trying out the following approach:

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=108769513&rettype=gp" | grep coded_by=

giving:

                 /coded_by="CP000384.1:2273218..2274069"

which I'm processing to use 2273218 and 2274069 as the genomic coordinates.

Let me know if this approach is right, and/or if there's a better way to do this.

Thank you.

genome blast identifiers • 2.6k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by Arjun Krishnan ▴ 40

Ram · Answer 1 · 2014-08-18

With Entrez Direct:

esearch -query 108765232 -db protein | elink -target gene | efetch -format documentsummary

1. Rxyl_1148
XRE family transcriptional regulator[Rubrobacter xylanophilus DSM 9941]
Other Aliases: Rxyl_1148
Annotation: NC_008148.1 (1176826..1177125)
ID: 4117380

Also check the documentation, "Sending results to scripts or spreadsheets" onwards is especially relevant.

Ram · Answer 2 · 2014-08-18

1

Entering edit mode

9.7 years ago

hugorody ▴ 20

I think the better way is downloading the GFF file of each genome and using the command "grep" from bash to capture the coordinates of each gene ID.

you could use:

#!/bin/bash
cat list_of_genesID.txt | while read line
do
y="$line"
x=`cat file.gff | fgrep "$y" | head -1`
echo "$x"
done > output.txt

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.7 years ago by hugorody ▴ 20