Question: Retrieve RefSeq protein accession from transcript accession
0
gravatar for speycast
9 weeks ago by
speycast0
speycast0 wrote:

Hi,

I'd like to know if anyone knows how to retrieve RefSeq protein accession # from its mRNA transcript accession # using NCBI E-utility tool?

For example: Using NM_001382556.2 to get NP_001369485.1

Thanks very much!

eutils refseq ncbi • 141 views
ADD COMMENTlink modified 9 weeks ago by genomax87k • written 9 weeks ago by speycast0
2
gravatar for genomax
9 weeks ago by
genomax87k
United States
genomax87k wrote:

Using Entrezdirect:

$ esearch -db nuccore -query "NM_001382556" | elink -target protein | efetch -format acc
NP_001369485.1
ADD COMMENTlink written 9 weeks ago by genomax87k

Thanks so much genomax!!!

ADD REPLYlink written 9 weeks ago by speycast0

genomax would you happen to know how to retrieve the reference sequence genbank file for this gene (RETL1) containing the above transcript NM_001382556.2? I want to use -format gbwithparts to get the mRNA and CDS region. There are other isoform transcripts in the genbank file, how can I just get the gbwithparts with FEATURES section containing just NM_001382556.2 and CDS regions?

Thanks again in advance...

ADD REPLYlink written 9 weeks ago by speycast0

You should only need:

$ efetch -db nuccore  -id "NM_001382556.2" -format gb

From reference genbank file as in chromosome/genome?

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by genomax87k

Yes as in chromosome/genome. efetch -db nuccore -id "NM_001382556.2" -format gb gives the gb in mRNA version. I would like the genome genbank format for this gene, but since it contains two isoform transcripts in the FEATURES section of full genbank file, I only want the full genbank with transcript of interest like mRNA and CDS in the FEATURES section. (like a truncated FEATURES section) with just one transcript: NM_001382556.2 along with its CDS

ADD REPLYlink written 9 weeks ago by speycast0

Can you tell me which specific genbank record you are looking at?

ADD REPLYlink written 9 weeks ago by genomax87k

For example the DMD full genbank record (gbwithparts) GRCh37 assembly, here's the link: https://www.ncbi.nlm.nih.gov/nuccore/NC_000023.10?report=genbank&from=31137345&to=33357726&strand=true

In the FEATURES section, there are 30 mRNA transcript_ids (first one begins with mRNA join(1..351...) and follows that is its protein and CDS section. I'm interested in getting only mRNA transcript_id NM_004006.2 and its protein_id NP_003997.1 and its CDS region for FEATURES section and all other sections the same.

ADD REPLYlink written 9 weeks ago by speycast0
1

Best I can think of is this but that is going to give you all transcripts in that range.

$ efetch -db nuccore -id NC_000023.10 -seq_start 31137345 -seq_stop 33357726 -format gb -style withparts
ADD REPLYlink written 9 weeks ago by genomax87k

Yup, thanks very much genomax! This is super helpful already, greatly appreciate it. I guess in this case I will just use python Bio package to parse the transcript of interest. :)

ADD REPLYlink written 9 weeks ago by speycast0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1829 users visited in the last hour