Retrieving domain loci
2
0
Entering edit mode
2.9 years ago

I've got a list of 200 Ensembl gene IDs and I want to get the start and stop nucleotides for known conserved domains. Going to https://www.ncbi.nlm.nih.gov/gene/2475, in the refSeq section there's a nice little summary of the conserved domains, but I cannot yet figure out how to get this information programmatically with Entrez, or find another database to acquire it from. The full protein record lists every repeat, every interaction site as a separate Region, which isn't useful. I'm pretty sure I should be querying the "cdd" database but I can't find any useful documentation for that.

Anyone done this before?

Cheers.

entrez database protein domains • 585 views
ADD COMMENT
0
Entering edit mode

Do you want the relative coordinates? Or do you want them mapped to the reference genome?

ADD REPLY
0
Entering edit mode
2.9 years ago

Figured it out! You want to use Biomart. You can query their database using a list of IDs and specify what info you want returned (including domain position info) by clicking on "Attributes".

ADD COMMENT
1
Entering edit mode
2.9 years ago
vkkodali ★ 2.8k

Conserved domains are annotated on the RefSeq proteins. If you are starting with NCBI GeneIDs, then you may want to first fetch the proteins annotated on that gene and then extract the CDD domains for each protein. You can use Entrez Direct for this as follows:

elink -db gene -target protein -name gene_protein_refseq -id 2475 \
  | efetch -format gpc \
  | xtract -insd Region INSDInterval_from INSDInterval_to region_name note db_xref \
  | grep 'CDD:'           
NP_004949.1     363     2549    TEL1            Phosphatidylinositol kinase or protein kinase, PI-3 family [Signal transduction mechanisms]; COG5032    CDD:227365
NP_004949.1     655     681     HEAT repeat     HEAT repeat [structural motif]  CDD:293787
NP_004949.1     691     721     HEAT repeat     HEAT repeat [structural motif]  CDD:293787
...
ADD COMMENT

Login before adding your answer.

Traffic: 1934 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6