How to retrieved protein Fasta sequence from accession number by Entrez
1
0
Entering edit mode
2.6 years ago
Nelo ▴ 20

AF131201
AF326487
AF326488
AF326489
AF326490

This are the some the accession number of protein. Firstly i dont know what kind of accession number is this. beacuse usually protein accession number start with XP_/NP_. Problem arise when i try the following command using the accession number given above:

  1. esearch -db protein -query "AF131201" | efetch -format fasta > output.fasta
  2. esearch -db nuccore -query "AF131201" | elink -target protein | efetch -db protein -format fasta
  3. Or directly submitting a bunch of accession number file to Entrez batch.

But i got nothing. Reason for the error is very sure due to the accession number i used. Can somebody help a way to solve this.

Thanks

protein • 1.9k views
ADD COMMENT
4
Entering edit mode
2.6 years ago
GenoMax 141k

Those are nucleotide accession numbers. You should do the following. Sequences truncated to save space.

$ esearch -db nuccore -query AF131201 | elink -target protein | efetch -format fasta
>AAD29676.1 plasma membrane MIP protein [Zea mays]
MEGKEEDVRLGANKFSERQPIGTAAQGAADDKDYKEPPPAPLFEPGELKSWSFYRAGIAEFVATFLFLYI
TILTVMGVSKSTSKCATVGIQGIAWSFGGMIFALVYCTAGISGGHINPAVTFGLFLARKLSLTRALFYII

to process multiple accessions

$ cat id
AF131201
AF326487
AF326488
AF326489
AF326490

$ cat id | epost -db nuccore -format acc | elink -target protein | efetch -format fasta

>AAD29676.1 plasma membrane MIP protein [Zea mays]
MEGKEEDVRLGANKFSERQPIGTAAQGAADDKDYKEPPPAPLFEPGELKSWSFYRAGIAEFVATFLFLYI

>AAK26757.1 plasma membrane integral protein ZmPIP1-6 [Zea mays]
MAGGTLQDRSEEEDVRVGVDRFPERQPIGTAADDLGRDYSEPPAAPLFEASELSSWSFYRAGIAEFVATF

>AAK26756.1 plasma membrane integral protein ZmPIP1-5 [Zea mays]
MEGKEEDVRLGANRYSERQPIGTAAQGTEEKDYKEPPPAPLFEAEELTSWSFYRAGIAEFVATFLFLYIS

>AAK26755.1 plasma membrane integral protein ZmPIP1-4 [Zea mays]
MEGKEEDVRLGANKFSERQPIGTAAQGAGAGDDDKDYKEPPPAPLFEPGELKSWSFYRAGIAEFVATFLF

>AAK26754.1 plasma membrane integral protein ZmPIP1-3 [Zea mays]
MEGKEEDVRLGANKFSERQPIGTAAQGAGAGDDDKDYKEPPPAPLFEPGELKSWSFYRAGIAEFVATFLF
LYITVLTVMGVSKSTSKCATVGIQGIAWSFGGMIFALVYCTAGISGGHINPAVTFGLFLARKLSLTRAIF
ADD COMMENT
0
Entering edit mode

Thank you so much for the help.

But some times the following text are also displayed on the terminal whenever i used the said command.

1) Can't locate Time/HiRes.pm in @INC (@INC contains: /usr/local/lib64/perl5 /usr/local/share/perl5 /usr/lib64/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib64/perl5 /usr/share/perl5 .). BEGIN failed--compilation aborted.

Is it necessary to download this Time-hiRes module ?

2) Unable to locate transmute executable. Please execute the following:

nquire -dwn ftp.ncbi.nlm.nih.gov entrez/entrezdirect transmute.Linux.gz gunzip -f transmute.Linux.gz chmod +x transmute.Linux

3) <PhraseNotFound>ABK60194[ACCN]</PhraseNotFound>

I am new to this kind of work. Don't know what I am asking exactly.

ADD REPLY
0
Entering edit mode

Not sure how you installed Entrezdirect but it does not seem to be installed properly.

It may be simplest to use conda to install. See conda part of this tutorial: Creating workflows with snakemake and conda

conda create -n edirect entrez-direct.

ADD REPLY
0
Entering edit mode

okie! I will try this

Do i have to sign in to NCBI API key ? If so why, is it necessary?

Tnq so much

ADD REPLY
0
Entering edit mode

If you are planning to do number of queries then you should. See this: https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/

ADD REPLY
0
Entering edit mode

Okay

Thank you for responding

ADD REPLY

Login before adding your answer.

Traffic: 2698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6