I would like to download annotated data on functional domains for the hg19 RefSeq. The information needed is the domain name and start and stop position of the functional domains for a given a NM_* or NR_ transcript id.
Are you sure you want the NM* accessions. Those are nucleotide and are not likely to have any domain information.
With NP* accessions you will get (with EntrezDirect, example below truncated to save space)
$ efetch -db protein -id NP_000050.3 -format ft
>Feature ref|NP_000050.3|
1 3418 Protein
product breast cancer type 2 susceptibility protein isoform 1
product BRCA1/BRCA2-containing complex, subunit 2
product breast cancer type 2 susceptibility protein
product DNA repair-associated BRCA2
product breast cancer 2 tumor suppressor
product breast and ovarian cancer susceptibility gene, early onset
product Fanconi anemia group D1 protein
product breast and ovarian cancer susceptibility protein 2
product breast cancer 2, early onset
product mutant BRCA2
product mutant DNA repair-associated protein 2
1 40 Region
region Interaction with PALB2
note propagated from UniProtKB/Swiss-Prot (P51587.4)
37 68 Region
region Disordered. /evidence=ECO:0000256|SAM:MobiDB-lite
note propagated from UniProtKB/Swiss-Prot (P51587.4)
70 70 Site
site_type phosphorylation
note Phosphoserine. /evidence=ECO:0007744|PubMed:23186163; propagated from UniProtKB/Swiss-Prot (P51587.4)
358 381 Region
region Disordered. /evidence=ECO:0000256|SAM:MobiDB-lite
note propagated from UniProtKB/Swiss-Prot (P51587.4)
Unfortunately, the software only reports NM and NR and not NP accessions. Could it be possible to translate the NM accessions? The goal is to map the protein positions to the functional domains.
Here is some documentation from the Ion Reporter Software:
Example data: transcript : NM_015215.2 , gene : CAMTA1, protein : p.Cys147Trp.
Documentation on the transcript value: "NM_ or NR_ NCBI versioned transcript identifiers (as specified by the gene-model files provided by UCSC RefSeq v63."
You could do something like this to "translate" the NM id to NP :
$ esearch -db nuccore -query NM_015215 | elink -target protein | efetch -format ft
>Feature ref|NP_056030.1|
1 1673 Protein
product calmodulin-binding transcription activator 1 isoform a
67 183 Region
region CG-1
note CG-1 domains are highly conserved domains of about 130 amino-acid residues
db_xref CDD:198144
112 119 Region
region Nuclear localization signal. /evidence=ECO:0000255|PROSITE-ProRule:PRU00767
note propagated from UniProtKB/Swiss-Prot (Q9Y6Y1.4)
283 375 Region
region Disordered. /evidence=ECO:0000256|SAM:MobiDB-lite
note propagated from UniProtKB/Swiss-Prot (Q9Y6Y1.4)
873 952 Region
region TIG
note IPT/TIG domain
db_xref CDD:426462
A small educational note: if an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they work. This will help future users that might find this post find the right answer.
Are you sure you want the
NM*
accessions. Those are nucleotide and are not likely to have any domain information.With
NP*
accessions you will get (with EntrezDirect, example below truncated to save space)Thank you for the response!
Unfortunately, the software only reports NM and NR and not NP accessions. Could it be possible to translate the NM accessions? The goal is to map the protein positions to the functional domains.
Here is some documentation from the Ion Reporter Software:
Example data: transcript : NM_015215.2 , gene : CAMTA1, protein : p.Cys147Trp.
Documentation on the transcript value: "NM_ or NR_ NCBI versioned transcript identifiers (as specified by the gene-model files provided by UCSC RefSeq v63."