I was given a list of protein accessions and the associated taxa, but I need the assembly accession to match the protein and taxonomy. Each protein is for different taxa in my case. From this post p/429609, I gather getting this information is difficult because,
WP* records represents a single, non-redundant, protein sequence which may be annotated on many different RefSeq genomes from the same, or different, species.
I found this to be the case when using e-utils as below:
myinputarg=$(cat protein_accessions.txt| tr "\n" ","); elink -id $myinputarg -db protein -target nuccore | efetch -format acc > assemblyAccessions.txt
One solution based on the above post is that I could use the -name option, but how would this work for multiple different taxa? vkkodali_ncbi do you kindly have any advice for me?
Thanks in advance! Morgan