Question

Retrieve RefSeq protein accession from transcript accession

0

Entering edit mode

5.1 years ago

speycast • 0

Hi,

I'd like to know if anyone knows how to retrieve RefSeq protein accession # from its mRNA transcript accession # using NCBI E-utility tool?

For example: Using NM_001382556.2 to get NP_001369485.1

Thanks very much!

NCBI RefSeq Eutils • 1.9k views

ADD COMMENT • link updated 5.1 years ago by GenoMax 152k • written 5.1 years ago by speycast • 0

score 2 · Accepted Answer · 2020-06-02

2

Entering edit mode

5.1 years ago

GenoMax 152k

Using Entrezdirect:

$ esearch -db nuccore -query "NM_001382556" | elink -target protein | efetch -format acc
NP_001369485.1

ADD COMMENT • link 5.1 years ago by GenoMax 152k

0

Entering edit mode

Thanks so much genomax!!!

ADD REPLY • link 5.1 years ago by speycast • 0

0

Entering edit mode

genomax would you happen to know how to retrieve the reference sequence genbank file for this gene (RETL1) containing the above transcript NM_001382556.2? I want to use -format gbwithparts to get the mRNA and CDS region. There are other isoform transcripts in the genbank file, how can I just get the gbwithparts with FEATURES section containing just NM_001382556.2 and CDS regions?

Thanks again in advance...

ADD REPLY • link 5.1 years ago by speycast • 0

0

Entering edit mode

You should only need:

$ efetch -db nuccore  -id "NM_001382556.2" -format gb

From reference genbank file as in chromosome/genome?

ADD REPLY • link 5.1 years ago by GenoMax 152k

0

Entering edit mode

Yes as in chromosome/genome. efetch -db nuccore -id "NM_001382556.2" -format gb gives the gb in mRNA version. I would like the genome genbank format for this gene, but since it contains two isoform transcripts in the FEATURES section of full genbank file, I only want the full genbank with transcript of interest like mRNA and CDS in the FEATURES section. (like a truncated FEATURES section) with just one transcript: NM_001382556.2 along with its CDS

ADD REPLY • link 5.1 years ago by speycast • 0

0

Entering edit mode

Can you tell me which specific genbank record you are looking at?

ADD REPLY • link 5.1 years ago by GenoMax 152k

0

Entering edit mode

For example the DMD full genbank record (gbwithparts) GRCh37 assembly, here's the link: https://www.ncbi.nlm.nih.gov/nuccore/NC_000023.10?report=genbank&from=31137345&to=33357726&strand=true

In the FEATURES section, there are 30 mRNA transcript_ids (first one begins with mRNA join(1..351...) and follows that is its protein and CDS section. I'm interested in getting only mRNA transcript_id NM_004006.2 and its protein_id NP_003997.1 and its CDS region for FEATURES section and all other sections the same.

ADD REPLY • link 5.1 years ago by speycast • 0

1

Entering edit mode

Best I can think of is this but that is going to give you all transcripts in that range.

$ efetch -db nuccore -id NC_000023.10 -seq_start 31137345 -seq_stop 33357726 -format gb -style withparts

ADD REPLY • link 5.1 years ago by GenoMax 152k

0

Entering edit mode

Yup, thanks very much genomax! This is super helpful already, greatly appreciate it. I guess in this case I will just use python Bio package to parse the transcript of interest. :)

ADD REPLY • link 5.1 years ago by speycast • 0