Question: Batch Retrieval of Conserved Domains from a List of IDs?
gravatar for kayrouz.1
22 months ago by
kayrouz.10 wrote:


I have a large number (>100,000) of functionally unrelated protein sequences and I want to generate a roughly functional annotation for each. I've tried PROKKA and RASTtk, but they tend to return a large number of "hypothetical" results so I've changed my approach. So far I have used RPS-BLAST to query each against the NCBI conserved domain database, which successfully gives me the NCBI-CDD identifier of the best hit for each. Now I want to retrieve the "name" associated with each of these identifiers.

What I've tried so far:

I currently have a NCBI-CDD identifier that encodes the top domain hit for each of my sequences. The identifiers take the form of a 6-digit number (i.e. 240628). I want to retrieve the "name" associated with each of these identifiers (for 240628, the name is "Phosphoglycerate dehydrogenase (PGDH) NAD-binding and catalytic domains").

In the past I have been able to retrieve information from NCBI using the command line to cycle through EFetch commands and write the output to a file, such as shown below:

However, when I try this approach with CDD identifiers, I just get a web-based XML readout that I can't really work with. See below for what I mean:

Is there any way to structure this command such that I receive a downloadable file that I can extract information from?

conserved domain entrez ncbi • 608 views
ADD COMMENTlink modified 22 months ago by vkkodali2.2k • written 22 months ago by kayrouz.10
gravatar for vkkodali
22 months ago by
United States
vkkodali2.2k wrote:

You can use Entrez Direct for this as follows:

esummary -db cdd -id 240628 | xtract -pattern DocumentSummary -element Id,Subtitle
240628  Phosphoglycerate dehydrogenase (PGDH) NAD-binding and catalytic domains

If you have a file with a long list of identifiers, you can use epost to first upload that list as follows:

epost -db cdd -input <file.txt> | esummary -db cdd | xtract -pattern DocumentSummary -element Id,Subtitle
ADD COMMENTlink written 22 months ago by vkkodali2.2k

You have solved my problem, thank you!

ADD REPLYlink written 22 months ago by kayrouz.10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1188 users visited in the last hour