Compare my FASTA with Taxonomics group
2
0
Entering edit mode
8.4 years ago
Paul ★ 1.5k

Dear all,

I would like to compare my FASTA assembly with taxonomic groups. I have taxonomic ID for viruses (10239), fungi (4751), bacteria (2) and mammalian (40674). Is there any way to find any percentage similarity via my command line using blast tools?

Something lie this:

blastp -db tax ID1, ID2, ID3... -query my.fasta -out similarity_result

Thank for any comment and help!

Paul

taxonomics blast FASTA NCBI fasta • 2.4k views
ADD COMMENT
2
Entering edit mode
8.4 years ago
5heikki 11k

From http://www.ncbi.nlm.nih.gov/books/NBK279680/

The BLAST taxonomy database is required in order to print the scientific name, common name, blast name, or super kingdom as part of the BLAST report or in a report with blastdbcmd. The BLAST database contains only the taxid (an integer) for each entry, and the taxonomy database allow BLAST to retrieve the scientific name etc. from a taxid. The BLAST taxonomy database consists of a pair of files (taxdb.bti and taxdb.btd) that are available as a compressed archive from the NCBI BLAST FTP site (ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz). The update_blastdb.pl script can be used to download and update this archive; it is recommended that the uncompressed contents of the archive be installed in the same directory where the BLAST databases reside. Assuming proper file permissions and that the BLASTDB environment variable contains the path to the installation directory of the BLAST databases, the following commands accomplish that:

# Download the taxdb archive
perl update_blastdb.pl taxdb
# Install it in the BLASTDB directory
gunzip -cd taxdb.tar.gz | (cd $BLASTDB; tar xvf - )
  

From http://www.ncbi.nlm.nih.gov/books/NBK52640/

Using the blast+ package installed above without configuration could be cumbersome - it requires that extraneous path be prefixed to the program call and database specification since the system does not know where to look for the installed program and the specified database. To streamline BLAST searches, two environment variables, PATH and BLASTDB, need to be modified and specified, respectively, to point to the corresponding directories.

Under bash, the following command appends the path to the new BLAST bin directory to the existing PATH setting:

$ export PATH="$PATH:$HOME/ncbi-blast-2.2.29+/bin"
  

The equivalent command under csh is:

$ setenv PATH ${PATH}:/home/tao/ncbi-blast-2.2.29+/bin
  

The modified $PATH can be examined using echo (added portion underlined):

$ echo $PATH
/usr/X11R6/bin:/usr/bin:/bin:/usr/local/bin:/opt/local/bin:/home/tao/ncbi-blast-2.2.29+/bin
  

To manage available BLAST databases, a subdirectory named db should be created. For the example installation, the following command creates such directory under ncbi-blast-2.2.29+ directory:

$ mkdir ./ncbi-blast-2.2.29+/db
  

Similar approaches described above can be used to set the BLASTDB value under bash:

$export BLASTDB="$HOME/ncbi-blast-2.2.29+/db"
  

Or under csh to create it anew:

set BLASTDB="$HOME/ncbi-blast-2.2.29+/db"
  

A better approach is to have the system automatically set these variables upon login, by modifying the .bash_profile or .cshrc file.

Once they are set, the system knows where to call BLAST programs, and the invoked program will know where to look for the database files. Note that with BLASTDB unspecified, blast+ programs only search the working directory, i.e. the directory where BLAST command is issued.

See blastp -help for fields you can output. For tabular output with the standard 12 columns + scientific name would be: -outfmt '6 std sscinames'

Edit. woops. Was supposed to post this as a comment.

ADD COMMENT
1
Entering edit mode
8.4 years ago
5heikki 11k

If you have configured tax db from NCBI ftp and you're blasting against a NCBI db, you can output subject sequence tax id, species name and "kingdom" (If I recall correctly: eukaryotes/prokaryotes/viruses/other). You can't output fungi, bacteria, mammals, etc, but need to derive those from the subject sequence GI or tax id with e.g. Entrez Direct. blastp -help

ADD COMMENT
0
Entering edit mode

thank you for comment. Please could you give my real command line example? I would like to do it from NCBI, but how to configure tax db from NCBI FTP? Sorry maybe for silly questions, but I have no experience with blast..

ADD REPLY

Login before adding your answer.

Traffic: 1557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6