taxid to genome refseq accession number
3
0
Entering edit mode
4.2 years ago

Dear all, I have a list of taxids like: 10243 10244 10246 10247 10248 10249

And I am looking for their corresponding RefSeq Genome Accession Numbers. One example I manually searched was for taxid 10243 , genome refseq accession number is NC_003663.2 . Please guide. thanks.

genome taxid refseq accession number genome • 2.8k views
4
Entering edit mode
4.2 years ago
Sej Modha 5.1k

The easiest way is to search against the nuccore database and limit the search against refseq using filter.

For example,

esearch -db nuccore -query "txid10242[Organism:exp] AND refseq[filter]"|efetch -format acc
NC_037656.1
NC_031033.1
NC_031038.1
NC_003663.2
NC_006998.1
NC_027213.1
NC_008291.1
NC_004105.1
NC_003391.1
NC_003310.1
NC_001611.1

esearch -db nuccore -query "txid10243[Organism:exp] AND refseq[filter]"|efetch -format acc
NC_003663.2

0
Entering edit mode

My system is not supporting these utilities, as a result it shows command not found error. Can we get some curl/wget link to get NC_XXX data for each taxid. Would any other way round be possible?

0
Entering edit mode

You might also find this eutils tutorial helpful.

1
Entering edit mode
0
Entering edit mode

The solution given there seems to help in fetching GI numbers, which is not what I require. I need is whole genome Refseq Accession number for each taxid. Thanks anyways for help.

0
Entering edit mode

Did you miss that part?

Since you want accession numbers add step 4a: Under "Summary" on left side of the page choose "Format" --> "Accession list".

0
Entering edit mode

Yes, sure. It's a manual way of doing, I am looking for a script /program as the id list exceeds lakhs. Once again thanks for your help.

1
Entering edit mode
4.1 years ago

Thank you all for your kind help and direction.

I have however utilized a different approach to gather information for acc. no.s , as my system couldn't install efetch and esearch (eutilities).

Also, manual way was inpossible for such a huge dataset.

My work is although a liitle exhaustive but had helped me so sharing with others for knowledge, just in case required:

wget url:

wget "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10234&lvl=3&lin=f&keep=1&srchmode=1&unlock"


Here I have replaced my taxid with $i which it read from list as, for i in cat list; do wget "https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=**"$i"**&lvl=3&lin=f&keep=1&srchmode=1&unlock" ; done


then an index file forms like index_******_ taxid_*****

grep -E "Scientific name|/genome/?term=txid""$i" wwwtax.cgi\?mode\=Info\&id\=**"$i"**\&lvl\=3\&lin\=f\&keep\=1\&srchmode\=1\&unlock >Details_$i  will save in Detais_$s the details of taxids whose genome is available, such as taxid 10244 : https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10244&lvl=3&lin=f&keep=1&srchmode=1&unlock

has and this id : 10234,

doesn't.

So grep will keep all that saved in Details file, from details get their NC_**** acc numbers using the following url:

https://www.ncbi.nlm.nih.gov/genome/?term=txid10244[Organism:exp]

Hope this might help someone in future too, or someone may improve this to make it more organised.

Thanks once again biostars, especially Sej Modha and genomax for your help and kind guidance.

Thank you

0
Entering edit mode

Hello ruchikabhat31,

thank you for giving response and detailed description of your final solution.

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.

Thank you!

0
Entering edit mode

Thank you finswimmer, for your help this time. I shall keep that in mind for the next time.