Question: How do I get the refseq IDs from a list of gene IDs
gravatar for tom5
6 months ago by
tom50 wrote:

I hope you're well. I have a list of entrez gene IDs (such as "426813" and "395451") and want to find the corresponding protein refseq IDs. My goal is to output a txt file with two columns, one for the original entrez gene ID and one for the corresponding refseq ID.

rna-seq • 182 views
ADD COMMENTlink modified 6 months ago by vkkodali2.2k • written 6 months ago by tom50
gravatar for vkkodali
6 months ago by
United States
vkkodali2.2k wrote:

You can download and parse the gene2refseq file from NCBI FTP site that has these mappings:

If you just have a few gene IDs to work with, you can use Entrez Direct as follows:

cat gene_id_list.txt | while read -r gid ; do 
    echo -ne "$gid\t" ; 
    elink -db gene -id $gid -target protein -name gene_protein_refseq \
        | efetch -format acc \
        | paste -s -d ',' ; 
done > gene2proteins.tsv
396320  NP_990694.1,XP_015144082.1
395771  NP_990262.1,XP_025000385.1,XP_015133186.1,XP_015133180.1,XP_015133175.1

awk 'BEGIN{FS="\t";OFS="\t"}{a=split($2,x,","); for (i=1;i<=a;++i) {print $1,x[i]}}' gene2proteins.tsv
396320  NP_990694.1
396320  XP_015144082.1
395771  NP_990262.1
395771  XP_025000385.1
395771  XP_015133186.1
395771  XP_015133180.1
395771  XP_015133175.1
ADD COMMENTlink written 6 months ago by vkkodali2.2k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1672 users visited in the last hour