Question: Blast - Formatting Output
15
gravatar for timjoncooper
4.8 years ago by
timjoncooper160
timjoncooper160 wrote:

Hi,

I've been using the blastn (version 2.2.28+) standalone tool against a custom formatted genome via:

blastn -db BLASTDB -word_size 7 -query input.fa -out filename -perc_identity 100 -outfmt 6 -max_target_seqs 2

To discard non-perfect hits and show only the 2 top hits.

The output file has a great format however is there a way to add an extra column that contains the actual target-seq (sequence of the matched hit)? Such that the fields are:

query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, sequence

Thanks!

  • TJC
format blast output • 77k views
ADD COMMENTlink modified 9 months ago by sridhar.rg0 • written 4.8 years ago by timjoncooper160

thre is a solution to see the sequence(query) in alignment result?

ADD REPLYlink written 2.8 years ago by midox190

all the valid fields are listed in the help

ADD REPLYlink written 2.8 years ago by Istvan Albert ♦♦ 77k
1

i know but in:

 qseqid means Query Seq-id
              qgi means Query GI
             qacc means Query accesion
          qaccver means Query accesion.version
             qlen means Query sequence length
           sseqid means Subject Seq-id

there is one that show the query(sequence)?

ADD REPLYlink written 2.8 years ago by midox190

Obviously none of these - after all none of those descriptions indicates that it would. Keep looking.

ADD REPLYlink written 2.8 years ago by Istvan Albert ♦♦ 77k

Hi!! Do you know how to see the sequence (query) in your blast result?

ADD REPLYlink written 2.3 years ago by figuerm0
33
gravatar for Istvan Albert
4.8 years ago by
Istvan Albert ♦♦ 77k
University Park, USA
Istvan Albert ♦♦ 77k wrote:

Run blastn -help then look for the field called outfmt

*** Formatting options
 -outfmt <String>
   alignment view options:
     0 = pairwise,
     1 = query-anchored showing identities,
     2 = query-anchored no identities,
     3 = flat query-anchored, show identities,
     4 = flat query-anchored, no identities,
     5 = XML Blast output,
     6 = tabular,
     7 = tabular with comment lines,
     8 = Text ASN.1,
     9 = Binary ASN.1,
    10 = Comma-separated values,
    11 = BLAST archive format (ASN.1) 

   Options 6, 7, and 10 can be additionally configured to produce
   a custom format specified by space delimited format specifiers.
   The supported format specifiers are:
           qseqid means Query Seq-id
              qgi means Query GI
             qacc means Query accesion
          qaccver means Query accesion.version
             qlen means Query sequence length
           sseqid means Subject Seq-id
        sallseqid means All subject Seq-id(s), separated by a ';'
              sgi means Subject GI
           sallgi means All subject GIs
             sacc means Subject accession
          saccver means Subject accession.version
          sallacc means All subject accessions
             slen means Subject sequence length
           qstart means Start of alignment in query
             qend means End of alignment in query
           sstart means Start of alignment in subject
             send means End of alignment in subject
             qseq means Aligned part of query sequence
             sseq means Aligned part of subject sequence
           evalue means Expect value
         bitscore means Bit score
            score means Raw score
           length means Alignment length
           pident means Percentage of identical matches
           nident means Number of identical matches
         mismatch means Number of mismatches
         positive means Number of positive-scoring matches
          gapopen means Number of gap openings
             gaps means Total number of gaps
             ppos means Percentage of positive-scoring matches
           frames means Query and subject frames separated by a '/'
           qframe means Query frame
           sframe means Subject frame
             btop means Blast traceback operations (BTOP)
          staxids means Subject Taxonomy ID(s), separated by a ';'
        sscinames means Subject Scientific Name(s), separated by a ';'
        scomnames means Subject Common Name(s), separated by a ';'
       sblastnames means Subject Blast Name(s), separated by a ';'
                (in alphabetical order)
       sskingdoms means Subject Super Kingdom(s), separated by a ';'
                (in alphabetical order) 
           stitle means Subject Title
       salltitles means All Subject Title(s), separated by a '<>'
          sstrand means Subject Strand
            qcovs means Query Coverage Per Subject
          qcovhsp means Query Coverage Per HSP
ADD COMMENTlink written 4.8 years ago by Istvan Albert ♦♦ 77k
17

To clarify "by space delimited format specifiers", it means write it as -outfmt "6 qacc sacc qseq sseq..."

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by ostrokach270
8

To add, one of format specifiers is std, which add there default set. It means that -outfmt "6 std qlen" prints standard and query length.

ADD REPLYlink written 2.6 years ago by kamiljaron100
2

outfmt 7 or 10 works perfect

ADD REPLYlink written 4.8 years ago by H@rry30

Thank you! Sorted it out now.

ADD REPLYlink written 4.8 years ago by timjoncooper160

Hi!! I have the same question and I don´t know how you sort it out? Was it that you used oufmt 7 or that you use -outfmt "6 qlen" ??

ADD REPLYlink written 2.3 years ago by figuerm0
0
gravatar for sridhar.rg
9 months ago by
sridhar.rg0
sridhar.rg0 wrote:

Just so you know, I was looking for this as well. The following did the job for me:

blastn -db <db_source> -query <query_source> -out <outfile> -outfmt "6 qseqid sseqid slen qstart qend length mismatch gapopen gaps sseq"  -word_size 5 -perc_identity 80

The option "sseq" will give the sequence that the query was aligned with. The option "qseq" will be the part of the query sequence.

ADD COMMENTlink written 9 months ago by sridhar.rg0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 964 users visited in the last hour