Question: Extract data from blast results
0
gravatar for Janey
7 months ago by
Janey10
USA
Janey10 wrote:

Hi

By running this command:

makeblastdb -in Total.assembly.fasta -parse_seqids -dbtype nucl -out my_db

blastn -db my_db -query X.fasta -out results.out

The following results were obtained:

Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.

Database: Total.assembly.fasta
           87,103 sequences; 164,122,436 total letters

Query= c41837_g1_i1

Length=1353
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

c41837_g1_i1  len=1353 path=[1:0-297 299:298-304 @306@!:305-511 5...  2499    0.0


>c41837_g1_i1 len=1353 path=[1:0-297 299:298-304 @306@!:305-511 513:512-532
534:533-730 732:731-733 @735@!:734-1164 1166:1165-1223 1225:1224-1352]
Length=1353

 Score = 2499 bits (1353),  Expect = 0.0
 Identities = 1353/1353 (100%), Gaps = 0/1353 (0%)
 Strand=Plus/Plus

Query  1     CaaaaacaaaaacaaagaaaacttaagaaaaaaTGCGCGCAATCCTCGCTCTTGCATTCA  60
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  1     CAAAAACAAAAACAAAGAAAACTTAAGAAAAAATGCGCGCAATCCTCGCTCTTGCATTCA  60

Query  61    TAGGCGCTGTCTTTGCTCAAACCACCGTCACTGACGTCCTTCAATCATACCGTGTCACCT  120
             ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  61    TAGGCGCTGTCTTTGCTCAAACCACCGTCACTGACGTCCTTCAATCATACCGTGTCACCT  120

How can filter these data based on the positive and negative numbers or low or high numbers of Scores and Values?

And also, how can extract IDs from these data?

rna-seq • 340 views
ADD COMMENTlink modified 7 months ago by Pierre Lindenbaum113k • written 7 months ago by Janey10

can you elaborate what exactly you mean with "based on the positive and negative numbers or low or high numbers of Scores and Values" ?

ADD REPLYlink written 7 months ago by lieven.sterck2.6k
4
gravatar for Buffo
7 months ago by
Buffo1.2k
Buffo1.2k wrote:

parse your result file using

blastn -db my_db -query X.fasta -out results.out -outfmt 6

outfmt 6= Tabular format, first column correspond to query ID, second is the subject id

OUTFMT 6 HEADER:

 1.  qseqid  query (e.g., gene) sequence id
 2.  sseqid  subject (e.g., reference genome) sequence id
 3.  pident  percentage of identical matches
 4.  length  alignment length
 5.  mismatch    number of mismatches
 6.  gapopen     number of gap openings
 7.  qstart  start of alignment in query
 8.  qend    end of alignment in query
 9.  sstart  start of alignment in subject
 10.     send    end of alignment in subject
 11.     evalue  expect value
 12.     bitscore    bit score
ADD COMMENTlink modified 7 months ago • written 7 months ago by Buffo1.2k

Thank you very much Buffo I got the answer to my second question, but is there any answer to my first question???

ADD REPLYlink written 7 months ago by Janey10
2

copying the results to excel and ordering by 3,11 or 12 column?

ADD REPLYlink written 7 months ago by Buffo1.2k
3

"copying to excel" ????

BLASPHEMY ! :) , just use linux sort to sort the data based on certain columns.

ADD REPLYlink written 7 months ago by lieven.sterck2.6k
2

you probably, but if Janey doesnt know how to parse a blast output I think that sort columns by command line would be more complicated issue. By the way, I´m an enthusiastic reader of biostars because it has been helpful for my bioinformatic problems, but, do you really consider necessary waste time for write answers like that? Do you really consider it helpful? Which is your suggestion for Janey? Blasphemy is criticizing without demonstrating any ability.

ADD REPLYlink written 7 months ago by Buffo1.2k
1

It was just a joke - he even used :).

I personally prefer :-), as I have a rather beautiful nose, but I guess ugly-nosed people will go for :), or :(.

ADD REPLYlink written 7 months ago by h.mon20k
1

Parsing blast outputs is a more complicated task than sorting columns to me.

I do agree that any approach that helps to resolve the issue is a good answer, thus to some extent I can follow your reasoning. On the other hand we're here to help (and to teach!) others , so in that context I feel it's common sense you at least provide (the better) alternatives in your answers.

there is no better help than to learn to do things cmdline! ;-)

ADD REPLYlink modified 6 months ago • written 6 months ago by lieven.sterck2.6k

Thanks to all the friends for their suggestions, especially dear Buffo

ADD REPLYlink written 6 months ago by Janey10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1270 users visited in the last hour