Question: Batch Retrieval of Conserved Domains from a List of IDs?
gravatar for kayrouz.1
8 days ago by
kayrouz.10 wrote:


I have a large number (>100,000) of functionally unrelated protein sequences and I want to generate a roughly functional annotation for each. I've tried PROKKA and RASTtk, but they tend to return a large number of "hypothetical" results so I've changed my approach. So far I have used RPS-BLAST to query each against the NCBI conserved domain database, which successfully gives me the NCBI-CDD identifier of the best hit for each. Now I want to retrieve the "name" associated with each of these identifiers.

What I've tried so far:

I currently have a NCBI-CDD identifier that encodes the top domain hit for each of my sequences. The identifiers take the form of a 6-digit number (i.e. 240628). I want to retrieve the "name" associated with each of these identifiers (for 240628, the name is "Phosphoglycerate dehydrogenase (PGDH) NAD-binding and catalytic domains").

In the past I have been able to retrieve information from NCBI using the command line to cycle through EFetch commands and write the output to a file, such as shown below:

However, when I try this approach with CDD identifiers, I just get a web-based XML readout that I can't really work with. See below for what I mean:

Is there any way to structure this command such that I receive a downloadable file that I can extract information from?

conserved domain entrez ncbi • 104 views
ADD COMMENTlink modified 8 days ago by vkkodali860 • written 8 days ago by kayrouz.10
gravatar for vkkodali
8 days ago by
United States
vkkodali860 wrote:

You can use Entrez Direct for this as follows:

esummary -db cdd -id 240628 | xtract -pattern DocumentSummary -element Id,Subtitle
240628  Phosphoglycerate dehydrogenase (PGDH) NAD-binding and catalytic domains

If you have a file with a long list of identifiers, you can use epost to first upload that list as follows:

epost -db cdd -input <file.txt> | esummary -db cdd | xtract -pattern DocumentSummary -element Id,Subtitle
ADD COMMENTlink written 8 days ago by vkkodali860

You have solved my problem, thank you!

ADD REPLYlink written 8 days ago by kayrouz.10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1109 users visited in the last hour