Is it possible to convert NC_ to NP_ for RefSeq ?
1
0
Entering edit mode
4.2 years ago
▴ 220

Hello,

As the title says, is it possible to obtain all NP_ given an NC_ accession number for RefSeq ? For example, if we have, NC_000014.9 is it possible to obtain all NP_ ?

Thank you !

RefSeq • 1.0k views
ADD COMMENT
2
Entering edit mode
4.2 years ago

You can accomplish that via entrez direct with:

esearch -db nuccore -query NC_000014 | elink -target protein | efetch -format acc > ids.txt

then

head ids.txt

prints:

NP_001095924.1
NP_891989.2
NP_065972.3
NP_777636.2
NP_065099.3
NP_001035365.1
NP_776249.1
NP_068814.2
NP_001428.1
NP_057270.1
ADD COMMENT
0
Entering edit mode

Thank you !! Is it possible to pass a text file ? I have a very big list.

ADD REPLY
2
Entering edit mode

use gnu parallel to automate tasks, for example

save the first ten proteins in a file

cat ids.txt | head  | parallel -j 1 efetch -db protein -format fasta -id {} > prots.fa
ADD REPLY
2
Entering edit mode

You can use epost for this. Say you have an input.txt file with the NC accessions, one per line, you can do as follows:

epost -db nuccore -input `input.txt` -format acc | elink -target protein | efetch -format acc > ids.txt
ADD REPLY
0
Entering edit mode

Thank you both for your answers !

ADD REPLY

Login before adding your answer.

Traffic: 2852 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6