Question: local blastx output
1
gravatar for agata88
10 months ago by
agata88770
Poland
agata88770 wrote:

Hi all!

I was running local blast command for transcripts fasta sequences as below:

blastx -query input/file.fasta -task blastx-fast -db blast-db-nr/nr -out output/file_blast_results.txt -evalue 0.001 - max_target_seqs 1 -num_threads 30 -outfmt '6 qaccver saccver pident length evalue qstart qend sstart send staxid ssciname scomname sblastname' > blast.log 2>&1&

I downloded nr database from ftp ncbi site and indexed it by makeblastdb script. As a results I have a list of accession numbers, percent of identity, length etc, the last descriptions are all NA.

My questions are:

  1. How can I get organism name, description of protein ect. from list of accession numbers like WP_083411507.1, CBW15324.1.

  2. I am observing that I have blast results for pig protein while my experiment include only bacteria - how can I select only prokaryotic nr part of database?

  3. In next step I would like to assign GO numbers to blast results, any idea how to do that?

Many thanks for any suggestions,

Best, Agata

PS. Input include ~3000 nucleotide sequences.

blastx • 440 views
ADD COMMENTlink modified 10 months ago • written 10 months ago by agata88770
1

I downloded nr database from ftp ncbi site and indexed it by makeblastdb script.

Small nitpick. You either downloaded the premade indexes (in which case you don't need to makeblastdb) or you downloaded the fasta sequences for nr (in that case you would need to makeblastdb).

sscinames (I think you missed a s above) and stitle should give you the two pieces of information you are requesting. You can probably use NCBI unix utils to get them after the fact if you don't want to re-do the search.

efetch -db protein -id "WP_083411507.1" -format docsum | xtract -pattern DocumentSummary -element Caption -element Title
WP_083411507    hypothetical protein [Arthrobacter sp. UCD-GKA]
CBW15324        unnamed protein product [Haemophilus parainfluenzae T3T1]

Blast2GO would be one possibility.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax67k

Thanks, I downloaded nr database from here: ftp://ftp.ncbi.nlm.nih.gov/blast/db//FASTA/nr.gz

Yes, that is correct i'v missed "s" at the end of sciname scomname and sblastname.

I am trying to avoid blast2go since is not open source, that is why I've decided to run blast locally on my own.

ADD REPLYlink written 10 months ago by agata88770
1

How can I get organism name

Obtain NCBI Taxonomy ID from local blast output

Do not forget you also need TaxDB ( ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz )

how can I select only prokaryotic nr part of database?

You may use -gilist (but GI numbers have been deprecated) or -seqidlist.

 -gilist <String>
   Restrict search of database to list of GI's
    * Incompatible with:  negative_gilist, seqidlist, negative_seqidlist,
   remote, subject, subject_loc
 -seqidlist <String>
   Restrict search of database to list of SeqId's
    * Incompatible with:  gilist, negative_gilist, negative_seqidlist, remote,
   subject, subject_loc

There are several ways of getting taxon-specific accessions, for example:

Extract all protein sequences of specific taxons from the NCBI nr database

In next step I would like to assign GO numbers to blast results

Use Blast2GO, dammit, Trinotate...

ADD REPLYlink written 10 months ago by h.mon25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 854 users visited in the last hour