Question: Retrieving domain loci
0
gravatar for jcthomas000
7 months ago by
jcthomas0000 wrote:

I've got a list of 200 Ensembl gene IDs and I want to get the start and stop nucleotides for known conserved domains. Going to https://www.ncbi.nlm.nih.gov/gene/2475, in the refSeq section there's a nice little summary of the conserved domains, but I cannot yet figure out how to get this information programmatically with Entrez, or find another database to acquire it from. The full protein record lists every repeat, every interaction site as a separate Region, which isn't useful. I'm pretty sure I should be querying the "cdd" database but I can't find any useful documentation for that.

Anyone done this before?

Cheers.

ADD COMMENTlink modified 7 months ago by vkkodali1.1k • written 7 months ago by jcthomas0000

Do you want the relative coordinates? Or do you want them mapped to the reference genome?

ADD REPLYlink written 7 months ago by benformatics1.1k
0
gravatar for jcthomas000
7 months ago by
jcthomas0000 wrote:

Figured it out! You want to use Biomart. You can query their database using a list of IDs and specify what info you want returned (including domain position info) by clicking on "Attributes".

ADD COMMENTlink written 7 months ago by jcthomas0000
1
gravatar for vkkodali
7 months ago by
vkkodali1.1k
United States
vkkodali1.1k wrote:

Conserved domains are annotated on the RefSeq proteins. If you are starting with NCBI GeneIDs, then you may want to first fetch the proteins annotated on that gene and then extract the CDD domains for each protein. You can use Entrez Direct for this as follows:

elink -db gene -target protein -name gene_protein_refseq -id 2475 \
  | efetch -format gpc \
  | xtract -insd Region INSDInterval_from INSDInterval_to region_name note db_xref \
  | grep 'CDD:'           
NP_004949.1     363     2549    TEL1            Phosphatidylinositol kinase or protein kinase, PI-3 family [Signal transduction mechanisms]; COG5032    CDD:227365
NP_004949.1     655     681     HEAT repeat     HEAT repeat [structural motif]  CDD:293787
NP_004949.1     691     721     HEAT repeat     HEAT repeat [structural motif]  CDD:293787
...
ADD COMMENTlink written 7 months ago by vkkodali1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1070 users visited in the last hour