Species identification via gilist
0
0
Entering edit mode
5.2 years ago
Penny Liu ▴ 30

I want to narrow down the blastn search against nt database using gilist.

I already got all taxids of bacteria (taxid 2) and extacted GIs with csvtk (Please refer to this).

The next step was to proceed bacterial species identification.

When I run

blastn -query query.fasta -db /path/to/nt -gilist bacteria.taxid.gi.txt -evalue 1e-6 -outfmt 6 -out sequences.txt


An error occured:

BLAST Database error: Specified file is not a valid GI/TI list.

Please refer to the attached file.

bacteria.taxid.gi.txt (Number of taxids: 309,264,110)

What am I doing wrong? Thanks for the help in advance.

gilist blast taxid • 2.1k views
1
Entering edit mode

Hello! I see a couple of possible problems:

1) your gi.list file is too large, 3 Gb. BLAST has some limits as far as I remember.

2) BLAST cannot find the file since you put it here: http://bioinfo.cs.ccu.edu.tw/CCU_bioinf/bacteria.taxid.gi.txt If you run blast ih the same directory, it's OK

3) Your list of gis have a header gi, that is not a gi-number, right?

0
Entering edit mode

You're right. The word gi is redundancy. I removed the redundant data from text file, then the problem is solved. :)

0
Entering edit mode
0
Entering edit mode

Hi Yi-Ting, can I ask how did you get this bacteria gi list from? I am trying to download it directly from the NCBI (by 'save to file' -> GI List etc...) but it failes due to timeout error.. Do you have an easy way to do that? tnx in advance

0
Entering edit mode

My extract method same as you. This process can take several hours to complete. I added multiple keywords (term=whole+genome+bacteria) to narrow down the search scope.