Question: Get Blast Database Size
0
gravatar for PoGibas
5.4 years ago by
PoGibas4.7k
Vilnius
PoGibas4.7k wrote:

I have piped unknown length sequences into makeblastdb. Now I want to know total length of those sequences (BLAST database size).

Example:

 # "cat" is used as an example
 # My "real" sequences are piped from "make random length sequences" command
 cat unknown_length_sequences
     >Seq1
     AA --//-- TG
     >Seq2
     GG --//-- TA
     >Seq3
     AC --//-- CC
     ...

 cat unknown_length_sequences | 
     makeblastdb \ 
         -in - \
         -dbtype 'nucl' \
         -parse_seqids \
         -out random_seq \
         -title "random_seq"

  Output files look like this:  
     random_seq.nhr
     random_seq.nin
     random_seq.nog
     random_seq.nsd
     random_seq.nsi
     random_seq.nsq

My question is - How to get BLAST database size (length of all the sequences)?
Result should be the same as using:
grep -v '>' INPUT | tr -d '\n' | wc

Edit
I want to achieve this without making intermediate files.

blast • 2.2k views
ADD COMMENTlink modified 5.4 years ago by Michael Dondrup45k • written 5.4 years ago by PoGibas4.7k
1
gravatar for Michael Dondrup
5.4 years ago by
Bergen, Norway
Michael Dondrup45k wrote:

How about parsing the output of blastdbcmd -info and take the total residues from 2. line:

$blastdbcmd -info -db ~/blastdb/swissprot
Database: Non-redundant UniProtKB/SwissProt sequences
    455,621 sequences; 169,969,125 total residues

Date: Jul 29, 2013  6:13 AM    Longest sequence: 41,943 residues
[...]
ADD COMMENTlink written 5.4 years ago by Michael Dondrup45k

Hm, I should update my blastdb...

ADD REPLYlink written 5.4 years ago by Michael Dondrup45k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour