I've got a list of 200 Ensembl gene IDs and I want to get the start and stop nucleotides for known conserved domains. Going to https://www.ncbi.nlm.nih.gov/gene/2475, in the refSeq section there's a nice little summary of the conserved domains, but I cannot yet figure out how to get this information programmatically with Entrez, or find another database to acquire it from. The full protein record lists every repeat, every interaction site as a separate Region, which isn't useful. I'm pretty sure I should be querying the "cdd" database but I can't find any useful documentation for that.
Conserved domains are annotated on the RefSeq proteins. If you are starting with NCBI GeneIDs, then you may want to first fetch the proteins annotated on that gene and then extract the CDD domains for each protein. You can use Entrez Direct for this as follows: