Extract nucleotide sequence from a RefSeq Transcript ID
1
0
Entering edit mode
8 weeks ago
Vincent Laufer ★ 2.4k

Hello,

Suppose I want to a nucleotide sequence from a specific transcript isoform for EGFR. I could, then, do something fairly manual like navigate to https://www.ncbi.nlm.nih.gov/nuccore/NM_001346941.2 and look scroll down, then count the nts, then cut and paste.

However, I feel there has got to be (probably many) programmatic ways extract (for example) the 1101st to 1217th nucleotides from this transcript.

I looked around and found things like biomartr::is.genome.available() but this appears to be for higher level downloading, like getting all the transcripts by organism.

I must be missing something. Is there a tool out there that, if given, download_refseq_nt_sequence(NM_001346941.2, '1110','1217'), will return the actual sequence?

Could be R, python, bash, or webtool; i can use any of them.

thank you very much

nucleotide refseq transcript sequence entrez • 688 views
ADD COMMENT
2
Entering edit mode
8 weeks ago
GenoMax 121k

Using Entrezdirect. Example for getting nucleotides 1101 to 1120.

$ efetch -db nuccore -id NM_001346941.2 -seq_start 1101 -seq_stop 1120 -format fasta
>NM_001346941.2:1101-1120 Homo sapiens epidermal growth factor receptor (EGFR), transcript variant EGFRvIII, mRNA
GGAGTTTGTGGAGAACTCTG
ADD COMMENT
0
Entering edit mode

thank you!!!!!!!!!!! this was so helpful. makes me think we convenient browser based tools etc. i really appreciate you.

ADD REPLY
0
Entering edit mode

suppose i wish to start from an amino acid position insteead, but then still pull nucleotides (or vice versa).

Is there anyway to grab variant positions neatly?

ADD REPLY
0
Entering edit mode

Looks like i may want something like this (piping thru efetch): esearch -db gene -query "BRCA2 [GENE] AND human [ORGN]" |. efetch -format docsum |

ADD REPLY
0
Entering edit mode

Can you provide an example?

ADD REPLY
0
Entering edit mode

Hey Geno!! Sure. thanks so much for following up.

Suppose what I have is:

NM_001346941             p.N550H

.. but what I want is:

NM_001346941             c.1648A>C

or better yet NM_001346941 (some # of NTs before)C(some # of NTs after) e.g.

NM_001346941             taaCggt

or even rsID:

NM_001346941             rs18448194
ADD REPLY
1
Entering edit mode

How about (truncated for space). First columns is rsID.

$ esearch -db nuccore -query NM_001346941 | elink -target gene | elink -target snp | esummary | xtract -pattern DocumentSummary -element SNP_ID,DOCSUM
1491558880      HGVS=NC_000007.14:g.55157614_55157619dup,NC_000007.13:g.55225307_55225312dup,NG_007726.3:g.143583_143588dup|SEQ=[-/AAGAAA]|LEN=9|GENE=EGFR:1956
5884400 HGVS=NC_000007.14:g.55123886TG[5],NC_000007.14:g.55123886TG[6],NC_000007.14:g.55123886TG[8],NC_000007.13:g.55191579TG[5],NC_000007.13:g.55191579TG[6],NC_000007.13:g.55191579TG[8],NG_007726.3:g.109855TG[5],NG_007726.3:g.109855TG[6],NG_007726.3:g.109855TG[8]|SEQ=[TGTG/-/TG/TGTGTG]|LEN=14|GENE=EGFR:1956
34058394        HGVS=NC_000007.14:g.55116538CA[6],NC_000007.14:g.55116538CA[7],NC_000007.14:g.55116538CA[8],NC_000007.14:g.55116538CA[9],NC_000007.14:g.55116538CA[11],NC_000007.14:g.55116538CA[12],NC_000007.14:g.55116538CA[13],NC_000007.14:g.55116538CA[14],NC_000007.14:g.55116538CA[15],NC_000007.13:g.55184231CA[6],NC_000007.13:g.55184231CA[7],NC_000007.13:g.55184231CA[8],NC_000007.13:g.55184231CA[9],NC_000007.13:g.55184231CA[11],NC_000007.13:g.55184231CA[12],NC_000007.13:g.55184231CA[13],NC_000007.13:g.55184231CA[14],NC_000007.13:g.55184231CA[15],NG_007726.3:g.102507CA[6],NG_007726.3:g.102507CA[7],NG_007726.3:g.102507CA[8],NG_007726.3:g.102507CA[9],NG_007726.3:g.102507CA[11],NG_007726.3:g.102507CA[12],NG_007726.3:g.102507CA[13],NG_007726.3:g.102507CA[14],NG_007726.3:g.102507CA[15]|SEQ=[CACACACA/-/CA/CACA/CACACA/CACACACACA/CACACACACACA/CACACACACACACA/CACACACACACACACA/CACACACACACACACACA]|LEN=21|GENE=EGFR:1956
1491516373      HGVS=NC_000007.14:g.55157425del,NC_000007.14:g.55157425dup,NC_000007.13:g.55225118del,NC_000007.13:g.55225118dup,NG_007726.3:g.143394del,NG_007726.3:g.143394dup|SEQ=[G/-/GG]|LEN=6|GENE=EGFR:1956
ADD REPLY
0
Entering edit mode

I really owe you one Geno. working on a deadline here and appreciate you.

ADD REPLY
0
Entering edit mode

Looks like this does a lot of it??

https://github.com/zwdzwd/transvar

ADD REPLY

Login before adding your answer.

Traffic: 1411 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6