Blast - Formatting Output
2
31
Entering edit mode
10.4 years ago
timjoncooper ▴ 320

Hi,

I've been using the blastn (version 2.2.28+) standalone tool against a custom formatted genome via:

blastn -db BLASTDB -word_size 7 -query input.fa -out filename -perc_identity 100 -outfmt 6 -max_target_seqs 2

To discard non-perfect hits and show only the 2 top hits.

The output file has a great format however is there a way to add an extra column that contains the actual target-seq (sequence of the matched hit)? Such that the fields are:

query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, sequence

Thanks!

  • TJC
blast format output • 170k views
ADD COMMENT
0
Entering edit mode

thre is a solution to see the sequence(query) in alignment result?

ADD REPLY
0
Entering edit mode

all the valid fields are listed in the help

ADD REPLY
1
Entering edit mode

I know but in:

       qseqid means Query Seq-id
          qgi means Query GI
         qacc means Query accesion
      qaccver means Query accesion.version
         qlen means Query sequence length
       sseqid means Subject Seq-id

there is one that show the query(sequence)?

ADD REPLY
0
Entering edit mode

Obviously none of these - after all none of those descriptions indicates that it would. Keep looking.

ADD REPLY
0
Entering edit mode

Hi!! Do you know how to see the sequence (query) in your blast result?

ADD REPLY
0
Entering edit mode

Hi, Is there any way to look for the sequence variation using the above command?

ADD REPLY
64
Entering edit mode
10.4 years ago

Run blastn -help then look for the field called outfmt

*** Formatting options
 -outfmt <String>
   alignment view options:
     0 = pairwise,
     1 = query-anchored showing identities,
     2 = query-anchored no identities,
     3 = flat query-anchored, show identities,
     4 = flat query-anchored, no identities,
     5 = XML Blast output,
     6 = tabular,
     7 = tabular with comment lines,
     8 = Text ASN.1,
     9 = Binary ASN.1,
    10 = Comma-separated values,
    11 = BLAST archive format (ASN.1) 

   Options 6, 7, and 10 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.
   The supported format specifiers are:
           qseqid means Query Seq-id
              qgi means Query GI
             qacc means Query accesion
          qaccver means Query accesion.version
             qlen means Query sequence length
           sseqid means Subject Seq-id
        sallseqid means All subject Seq-id(s), separated by a ';'
              sgi means Subject GI
           sallgi means All subject GIs
             sacc means Subject accession
          saccver means Subject accession.version
          sallacc means All subject accessions
             slen means Subject sequence length
           qstart means Start of alignment in query
             qend means End of alignment in query
           sstart means Start of alignment in subject
             send means End of alignment in subject
             qseq means Aligned part of query sequence
             sseq means Aligned part of subject sequence
           evalue means Expect value
         bitscore means Bit score
            score means Raw score
           length means Alignment length
           pident means Percentage of identical matches
           nident means Number of identical matches
         mismatch means Number of mismatches
         positive means Number of positive-scoring matches
          gapopen means Number of gap openings
             gaps means Total number of gaps
             ppos means Percentage of positive-scoring matches
           frames means Query and subject frames separated by a '/'
           qframe means Query frame
           sframe means Subject frame
             btop means Blast traceback operations (BTOP)
          staxids means Subject Taxonomy ID(s), separated by a ';'
        sscinames means Subject Scientific Name(s), separated by a ';'
        scomnames means Subject Common Name(s), separated by a ';'
       sblastnames means Subject Blast Name(s), separated by a ';'
                (in alphabetical order)
       sskingdoms means Subject Super Kingdom(s), separated by a ';'
                (in alphabetical order) 
           stitle means Subject Title
       salltitles means All Subject Title(s), separated by a '&lt;&gt;'
          sstrand means Subject Strand
            qcovs means Query Coverage Per Subject
          qcovhsp means Query Coverage Per HSP
ADD COMMENT
23
Entering edit mode

To clarify by "space delimited format specifiers", it means write it as -outfmt "6 qacc sacc qseq sseq..."

ADD REPLY
12
Entering edit mode

To add, one of format specifiers is std, which add there default set. It means that -outfmt "6 std qlen" prints standard and query length.

ADD REPLY
2
Entering edit mode

outfmt 7 or 10 works perfect

ADD REPLY
1
Entering edit mode

How to give a mismatch parameter in blastn. I was to perform alignment allowing 1 mismatch. I'm going through a lot of parameters but can't find this one.

ADD REPLY
0
Entering edit mode

How can I get the description ( first column in the figure) when I run the command line blastp?

blastp

ADD REPLY
0
Entering edit mode

The description can be added by "stitle".

ADD REPLY
0
Entering edit mode

Hi, Is there a way to find query strand information as well? Thanks

ADD REPLY
0
Entering edit mode

I believe that strand is the relative position of subject to query, hence if sstrand is reverse, it that the query reverse complementary to the reference sequence.

ADD REPLY
0
Entering edit mode

Hi . Can you please suggest How to use output format 8 . -outfmt 8 doesn't work. I am trying to do blastp and have tried using -m 8 and -outfmt 8. It doesn't work though it works with -outfmt 6.

ADD REPLY
4
Entering edit mode
6.3 years ago
sridhar.rg ▴ 40

Just so you know, I was looking for this as well. The following did the job for me:

blastn -db <db_source> -query <query_source> -out <outfile> -outfmt "6 qseqid sseqid slen qstart qend length mismatch gapopen gaps sseq"  -word_size 5 -perc_identity 80

The option "sseq" will give the sequence that the query was aligned with. The option "qseq" will be the part of the query sequence.

ADD COMMENT

Login before adding your answer.

Traffic: 2937 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6