makeblastdb with organism names instead of gi number
2
0
Entering edit mode
9.2 years ago
Karyo ▴ 10

Hi, I am running BLASTP to find a unique protein against NR database on a local machine. However, the BLASTP tabular format result gives gi numbers as the sequence identifiers. But, I want them to be organsim (or species) name instead of gi numbers.

To do that, what are the makeblastdb options required?

blast • 2.2k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
Ram 43k

You're better off writing a script to look-up organism or species information from the GI numbers than changing the input to makeblastdb. makeblastdb needs sequence identifiers to be unique and your changes will result in makeblastdb erroring out. You can work around that by adding suffixes to org/species names, but that's too much effort invested in a result that won't hold up in cross-database look-ups.

A simple query to NCBI to getch the GenBank or GenPept record should give you access to any attribute you need.

ADD COMMENT
0
Entering edit mode
9.2 years ago
Siva ★ 1.9k

Another way is to download the BLAST taxonomy database as described here. Then use the "sscinames" output format option for BLASTP to get the species name in the BLAST result. The species name will be an additional column. If you want the species name information in the identifier, you can parse the BLAST output and concatenate the sequence identifier and the species name.

ADD COMMENT

Login before adding your answer.

Traffic: 2575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6