Question: Entrez direct E-utilities - "efetch" command to retrieve CDS with protein accessions does not work
0
gravatar for al-ash
3.2 years ago by
al-ash110
Japan/Okinawa/OIST
al-ash110 wrote:

I'm using Entrez Direct E-utilities to retrieve protein sequences with protein IDs but the option to retrieve CDS when using a protein ID is not working for me with the following command with an example protein accession:

efetch -db protein -format fasta_cd_na -id XP_003399879.1

although the command to fetch the protein FASTA works:

efetch -db protein -format fasta -id XP_003399879.1

Could you point me towards a mistake? Or is it because the efetch command does not work this way? Thanks!

ADD COMMENTlink modified 8 months ago by Biostar ♦♦ 20 • written 3.2 years ago by al-ash110

curious, what kind of ID is that?

ADD REPLYlink written 2.4 years ago by a.aiezza30
2
gravatar for DCGenomics
3.2 years ago by
DCGenomics320
United States
DCGenomics320 wrote:

The following EDirect commands will get the CDS FASTA from a protein accession:

elink -db protein -id XP_003399879.1 -target nuccore | \
  efilter -molecule mrna | \
  efetch -format fasta_cds_na
ADD COMMENTlink modified 13 months ago by h.mon28k • written 3.2 years ago by DCGenomics320

This solution doesn't work for me, it returns:

QueryKey value not found in filter input

QueryKey value not found in fetch input

ADD REPLYlink written 13 months ago by h.mon28k
0
gravatar for piet
3.2 years ago by
piet1.7k
planet earth
piet1.7k wrote:

The coding sequence (CDS) is a genomic nucleotide sequence, thus you have to retrieve it from the 'nucleotide' database rather then from the 'protein' database. In this case, XP_003399879.1, the coding sequence is XM_003399831.2:24..1547.

ADD COMMENTlink written 3.2 years ago by piet1.7k

In other words, it is not possible to use efetch with a protein ID as an input to obtain directly the CDS sequence, right? Rather, it is still necessary to convert first the protein ID to gene ID...I'm a bit surprised that the tool can not do this job...anyway, thanks for your reply!

ADD REPLYlink written 3.2 years ago by al-ash110
0
gravatar for h.mon
13 months ago by
h.mon28k
Brazil
h.mon28k wrote:

The problem is you have a typo in your command to recover CDS, is should be -format fasta_cds_na, not -format fasta_cd_na. The following works.

efetch -db protein -format fasta_cds_na -id XP_003399879.1
ADD COMMENTlink written 13 months ago by h.mon28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2079 users visited in the last hour