Question: How do I get the refseq IDs from a list of gene IDs
0
gravatar for tom5
6 months ago by
tom50
tom50 wrote:

I hope you're well. I have a list of entrez gene IDs (such as "426813" and "395451") and want to find the corresponding protein refseq IDs. My goal is to output a txt file with two columns, one for the original entrez gene ID and one for the corresponding refseq ID.

rna-seq • 182 views
ADD COMMENTlink modified 6 months ago by vkkodali2.2k • written 6 months ago by tom50
3
gravatar for vkkodali
6 months ago by
vkkodali2.2k
United States
vkkodali2.2k wrote:

You can download and parse the gene2refseq file from NCBI FTP site that has these mappings: https://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz.

If you just have a few gene IDs to work with, you can use Entrez Direct as follows:

cat gene_id_list.txt | while read -r gid ; do 
    echo -ne "$gid\t" ; 
    elink -db gene -id $gid -target protein -name gene_protein_refseq \
        | efetch -format acc \
        | paste -s -d ',' ; 
done > gene2proteins.tsv
396320  NP_990694.1,XP_015144082.1
395771  NP_990262.1,XP_025000385.1,XP_015133186.1,XP_015133180.1,XP_015133175.1

awk 'BEGIN{FS="\t";OFS="\t"}{a=split($2,x,","); for (i=1;i<=a;++i) {print $1,x[i]}}' gene2proteins.tsv
396320  NP_990694.1
396320  XP_015144082.1
395771  NP_990262.1
395771  XP_025000385.1
395771  XP_015133186.1
395771  XP_015133180.1
395771  XP_015133175.1
ADD COMMENTlink written 6 months ago by vkkodali2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1672 users visited in the last hour