Question

Functional domain data from NM_ transcript id

0

Entering edit mode

4 months ago

sofie • 0

Hello!

I would like to download annotated data on functional domains for the hg19 RefSeq. The information needed is the domain name and start and stop position of the functional domains for a given a NM_* or NR_ transcript id.

I have some trouble finding the information in the right format. I tried fetching data from biological_region from the RefSeq Functional Element from NCBI (https://www.ncbi.nlm.nih.gov/refseq/functionalelements/#Gene_FTP).

It seems the functional domains are here coded by NC_* numbers and are not compatible with NM_* IDs.

Does anyone know how the functional domain annotations can be fetched for NM_* transcript IDs?

Thanks in advance

transcript functional-domain • 562 views

ADD COMMENT • link updated 3 months ago by Ram 43k • written 4 months ago by sofie • 0

1

Entering edit mode

Are you sure you want the NM* accessions. Those are nucleotide and are not likely to have any domain information.

With NP* accessions you will get (with EntrezDirect, example below truncated to save space)

$ efetch -db protein -id NP_000050.3 -format ft
>Feature ref|NP_000050.3|
1       3418    Protein
                        product breast cancer type 2 susceptibility protein isoform 1
                        product BRCA1/BRCA2-containing complex, subunit 2
                        product breast cancer type 2 susceptibility protein
                        product DNA repair-associated BRCA2
                        product breast cancer 2 tumor suppressor
                        product breast and ovarian cancer susceptibility gene, early onset
                        product Fanconi anemia group D1 protein
                        product breast and ovarian cancer susceptibility protein 2
                        product breast cancer 2, early onset
                        product mutant BRCA2
                        product mutant DNA repair-associated protein 2
1       40      Region
                        region  Interaction with PALB2
                        note    propagated from UniProtKB/Swiss-Prot (P51587.4)
37      68      Region
                        region  Disordered. /evidence=ECO:0000256|SAM:MobiDB-lite
                        note    propagated from UniProtKB/Swiss-Prot (P51587.4)
70      70      Site
                        site_type       phosphorylation
                        note    Phosphoserine. /evidence=ECO:0007744|PubMed:23186163; propagated from UniProtKB/Swiss-Prot (P51587.4)
358     381     Region
                        region  Disordered. /evidence=ECO:0000256|SAM:MobiDB-lite
                        note    propagated from UniProtKB/Swiss-Prot (P51587.4)

ADD REPLY • link 4 months ago by GenoMax 142k

0

Entering edit mode

Thank you for the response!

Unfortunately, the software only reports NM and NR and not NP accessions. Could it be possible to translate the NM accessions? The goal is to map the protein positions to the functional domains.

Here is some documentation from the Ion Reporter Software:

Example data: transcript : NM_015215.2 , gene : CAMTA1, protein : p.Cys147Trp.

Documentation on the transcript value: "NM_ or NR_ NCBI versioned transcript identifiers (as specified by the gene-model files provided by UCSC RefSeq v63."

ADD REPLY • link 4 months ago by sofie • 0

score 2 · Accepted Answer · 2024-01-12

You could do something like this to "translate" the NM id to NP :

$ esearch -db nuccore -query NM_015215 | elink -target protein | efetch -format ft
>Feature ref|NP_056030.1|
1       1673    Protein
                        product calmodulin-binding transcription activator 1 isoform a
67      183     Region
                        region  CG-1
                        note    CG-1 domains are highly conserved domains of about 130 amino-acid residues
                        db_xref CDD:198144
112     119     Region
                        region  Nuclear localization signal. /evidence=ECO:0000255|PROSITE-ProRule:PRU00767
                        note    propagated from UniProtKB/Swiss-Prot (Q9Y6Y1.4)
283     375     Region
                        region  Disordered. /evidence=ECO:0000256|SAM:MobiDB-lite
                        note    propagated from UniProtKB/Swiss-Prot (Q9Y6Y1.4)
873     952     Region
                        region  TIG
                        note    IPT/TIG domain
                        db_xref CDD:426462