Local blast limit query search by GI list?
6.4 years ago
Hi All,

I am trying to blast a file that contains about 42k fasta sequences against a local blast database (nt), and I would like to restrict the search space. I read that a common way to do that is to restrict the search using "gi" (see command line below).

My question is: How would you go about to obtain a list of gi striclty for bacteriophage related nucleotide sequences? What I have done before is going to the NCBI nucleotide database, searching for "bacteriophage", then exporting the list of results to a gi file. But I am not sure if this is the way to do it as I get also other results (other microbes).

$blastn -db nt -gilist list.gi -query seq.fasta -out blast_results.txt Latest blast bacteriophage • 3.0k views ADD COMMENT 3 Entering edit mode 6.4 years ago GenoMax 117k That is the right way to do this. Getting all viral genomes and parsing out bacteriophage gi's may be preferred option. I see 1700+ entries for phages. $ grep "phage" viral.1.1.genomic.fna | awk -F "|" '{print \$2}' > phage_gi_list


should do it.

You could try the taxonomy ID route to get a more restricted set of gi: http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=38018 I am not sure if that option gives you all bacteriophages though.

Thanks, the first link is indeed too restrictive, but I see what you mean. I'll explore a bit further.

Go with the viral genomes option. I will move it up in the post above.