best automated method (unix commandline) to get genome-assembly-accession number for given protein-accession-number (ncbi)
1
0
Entering edit mode
3.4 years ago
JV ▴ 470

What would be the best way (if there is any) to get the accession number of a genome assembly that contains a given protein accession number?

E.g. if i just have the results of a blastp run of a given query against the ncbi nr database and take the accession numbers of the subjects, how do i best find out if a corresponding genome assembly exists that contains these subject proteins, and what the genome assembly accession number would be?

I am not simply looking to find similar proteins in the genome databases (so no blastx against wgs for example), but to find out which exact genome was the source of which exact protein accession.

I was originally hoping to be able to simply parse that info from the genbank-entry of the subject protein in question, but it turns out the protein genbanks do not necessarily contain that info... Is there perhaps a lookup table linking protein accessions to genome assemblies?

I am looking for a automizable solution (so command-line based).

Can anybody help with that?

ncbi blast Assembly • 1.3k views
ADD COMMENT
1
Entering edit mode
3.4 years ago
GenoMax 141k

Entrezdirect should fit the bill. It would help if you can provide examples of such accession numbers.

ADD COMMENT
0
Entering edit mode

This worked for me, using the Entrez Direct tools:

elink -db protein -id "WP_036492677.1" -target nuccore | elink -target assembly | esummary | xtract -pattern DocumentSummary -element RefSeq

ADD REPLY

Login before adding your answer.

Traffic: 2718 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6