Question

Entrez direct E-utilities - "efetch" command to retrieve CDS with protein accessions does not work

0

Entering edit mode

7.9 years ago

al-ash ▴ 200

UPDATE: problem solved - it was just a typo and following line does what it is supposed to:

efetch -db protein -format fasta_cds_na -id XP_003399879.1

ORIGINAL REQUEST: I'm using Entrez Direct E-utilities to retrieve protein sequences with protein IDs but the option to retrieve CDS when using a protein ID is not working for me with the following command with an example protein accession:

efetch -db protein -format fasta_cd_na -id XP_003399879.1

although the command to fetch the protein FASTA works:

efetch -db protein -format fasta -id XP_003399879.1

Could you point me towards a mistake? Or is it because the efetch command does not work this way? Thanks!

Entrez Direct E-utilities efetch CDS retrieve • 8.5k views

ADD COMMENT • link 3.9 years ago by al-ash ▴ 200

0

Entering edit mode

curious, what kind of ID is that?

ADD REPLY • link 7.0 years ago by a.aiezza ▴ 30

2

Entering edit mode

7.9 years ago

DCGenomics ▴ 330

The following EDirect commands will get the CDS FASTA from a protein accession:

elink -db protein -id XP_003399879.1 -target nuccore | \
  efilter -molecule mrna | \
  efetch -format fasta_cds_na

ADD COMMENT • link updated 5.8 years ago by h.mon 35k • written 7.9 years ago by DCGenomics ▴ 330

0

Entering edit mode

This solution doesn't work for me, it returns:

QueryKey value not found in filter input

QueryKey value not found in fetch input

ADD REPLY • link 5.8 years ago by h.mon 35k

0

Entering edit mode

7.9 years ago

piet ★ 1.8k

The coding sequence (CDS) is a genomic nucleotide sequence, thus you have to retrieve it from the 'nucleotide' database rather then from the 'protein' database. In this case, XP_003399879.1, the coding sequence is XM_003399831.2:24..1547.

ADD COMMENT • link 7.9 years ago by piet ★ 1.8k

0

Entering edit mode

In other words, it is not possible to use efetch with a protein ID as an input to obtain directly the CDS sequence, right? Rather, it is still necessary to convert first the protein ID to gene ID...I'm a bit surprised that the tool can not do this job...anyway, thanks for your reply!

ADD REPLY • link 7.9 years ago by al-ash ▴ 200

score 2 · Accepted Answer · 2018-10-17

2

Entering edit mode

5.8 years ago

h.mon 35k

The problem is you have a typo in your command to recover CDS, is should be -format fasta_cds_na, not -format fasta_cd_na. The following works.

efetch -db protein -format fasta_cds_na -id XP_003399879.1

ADD COMMENT • link 5.8 years ago by h.mon 35k