Question: about Blastx generated file
0
gravatar for Kurban
5.1 years ago by
Kurban170
china/Urumqi/xinjiang academy of animal scinces
Kurban170 wrote:

Hello,

i used blastx blasted my query file with a protein db file.

my query sequences more than 140000, so i just want to see aligned query sequences. but the result gives all the query , and their blast result  as: "aligned...  or no hit was found" , respectively. that makes selection of aligned query sequences from the blastx result file a tremendous work. so if i can only extract the aligned query sequences and their alignment information (e- value, score and aligned sequence)  would simplify my job a lot.

this is the blastx out file:

Query= comp936_c0_seq10 len=156 path=[335:0-24 360:25-155]

Length=156

***** No hits found *****

Lambda      K        H        a         alpha
   0.318    0.134    0.401    0.792     4.96

Gapped
Lambda      K        H        a         alpha    sigma
   0.267   0.0410    0.140     1.90     42.6     43.6

Effective search space used: 21583458

Query= comp1863_c0_seq1 len=2184 path=[0:0-1278 1279:1279-1279
1280:1280-1303 1304:1304-2183]

Length=2184
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

  FBpp0075807 FBgn0000404 symbol:CycA family:Transcription Cofact...   287    8e-89

> FBpp0075807 FBgn0000404 symbol:CycA family:Transcription Cofactors
species:Drosophila melanogaster
Length=491

 Score =  287 bits (735),  Expect = 8e-89, Method: Compositional matrix adjust.
Identities = 184/468 (39%), Positives = 261/468 (56%), Gaps = 50/468 (11%)
Frame = -3

Query  1810  MATINIHPDQENRV-PELRqkqannamaaqqKRTGLGLIDHN----KANKAVPKGKQ--P  1652
             MA+  IH D  N+  P ++               G G  + N    +AN AV  G    P
Sbjct  1     MASFQIHQDMSNKENPGIKIPAGVKNTKQPLAVIG-GKAEKNALAPRANFAVLNGNNNVP  59

Query  1651  LKESNLSNAR-VENIHVKEN------RKNVVVPVAQFEAFTVYED--DEQRARIDQKL-R  1502
                  +   R V N++V EN      + NVV  V QF+ F+VYED  D Q A   + L 
Sbjct  60    RPAGKVQVFRDVRNLNVDENVEYGAKKSNVVPVVEQFKTFSVYEDNNDTQVAPSGKSLAS  119

Query  1501  LISKSN--VYKGTAEDRFITKTELAEIERkkqlqklKELAEIPAVIEPKCENDPCTPMSI  1328
             L+ K N  V  G  +                     KEL +      P    D  +PMS+
Sbjct  120   LVDKENHDVKFGAGQ---------------------KELVDYDLDSTPMSVTDVQSPMSV  158

Query  1327  EK-LNDENAENDSSQLAEEVIRKNSNVKDL--------FFEMEEYRDDIYAYLREHELRH  1175
             ++ +      +D S   E  +     VK+L        F E+ +Y+ DI  Y RE E +H
Sbjct  159   DRSILGVIQSSDISVGTETGVSPTGRVKELPPRNDRQRFLEVVQYQMDILEYFRESEKKH  218

Query  1174  RPKPGYIVKQPDVTENMRAVLVDWLVEVTEEYKMQTETLYLAVNFIDRFLSYMSVVRAKL  995
             RPKP Y+ +Q D++ NMR++L+DWLVEV+EEYK+ TETLYL+V ++DRFLS M+VVR+KL
Sbjct  219   RPKPLYMRRQKDISHNMRSILIDWLVEVSEEYKLDTETLYLSVFYLDRFLSQMAVVRSKL  278

Query  994   QLVGTAAMFIASKYEEIFPPDVSEFVYITDDTYDKHQVIRMEHLILRVLGFDLSVPTPLT  815
             QLVGTAAM+IA+KYEEI+PP+V EFV++TDD+Y K QV+RME +IL++L FDL  PT  
Sbjct  279   QLVGTAAMYIAAKYEEIYPPEVGEFVFLTDDSYTKAQVLRMEQVILKILSFDLCTPTAYV  338

Query  814   FINATCISAGLTEKTMYLAMYLSEIALLEVEPYLQFLPSVIASSAIALARHTLGEEAWND  635
             FIN   +   + EK  Y+ +Y+SE++L+E E YLQ+LPS+++S+++ALARH LG E W 
Sbjct  339   FINTYAVLCDMPEKLKYMTLYISELSLMEGETYLQYLPSLMSSASVALARHILGMEMWTP  398

Query  634   SLYKHTGYTLKQLQLCICFLYDMFVKAPNHPQHAIQDKYRSRKYMQVS  491
              L + T Y L+ L+  +  L      A      A+++KY    Y +V+
Sbjct  399   RLEEITTYKLEDLKTVVLHLCHTHKTAKELNTQAMREKYNRDTYKKVA  446

Lambda      K        H        a         alpha
   0.318    0.134    0.401    0.792     4.96

Gapped
Lambda      K        H        a         alpha    sigma
   0.267   0.0410    0.140     1.90     42.6     43.6

Effective search space used: 472565421

Query= comp1199_c0_seq1 len=1877 path=[19533:0-169 21522:170-173
19704:174-982 21495:983-986 20513:987-1876]

Length=1877

***** No hits found *****

Lambda      K        H        a         alpha
   0.318    0.134    0.401    0.792     4.96

Gapped
Lambda      K        H        a         alpha    sigma
   0.267   0.0410    0.140     1.90     42.6     43.6

Effective search space used: 397904649

 

blast • 1.8k views
ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by Kurban170
0
gravatar for RamRS
5.1 years ago by
RamRS25k
Houston, TX
RamRS25k wrote:

You can use a simple BioPython script to convert your plain text result to tabular format, which will be easier to filter and process. Take a look here: http://biopython.org/DIST/docs/api/Bio.SearchIO.BlastIO-module.html

 

Or, you can edit my script (this one just filters out no-hits) to read plain text and write tabular. This is BioPerl though. https://github.com/RamRS/myPerlScripts/blob/master/filterBlastReport.pl

ADD COMMENTlink modified 5.1 years ago • written 5.1 years ago by RamRS25k

hello Mr.RamRS

i tried ur script , before any change it showed this result:

kurban@kurban-X550VC:~/Desktop/tf$ perl filterBlastReport.pl tf.blast
Getopt::ArgParse: Option in is required
kurban@kurban-X550VC:~/Desktop/tf$

kurban@kurban-X550VC:~/Desktop/tf$ perl filterBlastReport.pl --in tf.blast
filterBlastReport.pl: remove entries with no hits from BLAST output file
usage: filterBlastReport.pl [--help|-h] --in|-i

This script reads a BLAST results file as input\ and filters out query
sequences with no hits to the database. \ The results are written in plain text
format to output file.

optional arguments:
    --help, -h     ? show this help message and exit
    --in, -i IN      input BLAST results file


then my friend changed the script little bit (i believe he change the line 21"if(scalar(@ARGV) != 2)"), then it gives this:

kurban@kurban-X550VC:~/Desktop/tf$ perl changed.pl --in tf.blast
2Getopt::ArgParse::Namespace=HASH(0x2adfe78)
unknown option: fasta at changed.pl line 30.

i could not be able to find where is the problem.

 

ADD REPLYlink written 5.1 years ago by Kurban170

Check the usage line, it needs the -i flag before the input file name :)

Run the script (the version before your friend changed it) like so:

perl filterBlastReport.pl -i tf.blast

You'd have to change the argparse code if you wanna use input files without the flag.

EDIT: The change your friend made just bypasses the line of code trying to warn you of an imminent failure - it does nothing to address the cause whatsoever :)

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by RamRS25k
0
gravatar for Kurban
5.1 years ago by
Kurban170
china/Urumqi/xinjiang academy of animal scinces
Kurban170 wrote:

i tried that commend line several times too before make any change of the script, and got the same result:

kurban@kurban-X550VC:~/Desktop/tf$ perl filterBlastReport.pl -i tf.blast
filterBlastReport.pl: remove entries with no hits from BLAST output file
usage: filterBlastReport.pl [--help|-h] --in|-i

This script reads a BLAST results file as input\ and filters out query
sequences with no hits to the database. \ The results are written in plain text
format to output file.

optional arguments:
    --help, -h     ? show this help message and exit
    --in, -i IN      input BLAST results file
kurban@kurban-X550VC:~/Desktop/tf$

 

ADD COMMENTlink written 5.1 years ago by Kurban170
1

I just fixed it - it should work fine now. Sorry for the inconvenience,

ADD REPLYlink written 5.1 years ago by RamRS25k

yes Sir, it runs perfect now.

no no, there has not been any inconvenience actually. your suggestion and scripts have been great help,  thank you for your time and patience.

ADD REPLYlink modified 5.1 years ago • written 5.1 years ago by Kurban170

You're very welcome! Glad I could be of help, and thank you for finding the bug in my code.

ADD REPLYlink written 5.1 years ago by RamRS25k

That's weird. I guess the script is a bit buggy. I'll work on it and let you know once it is tweaked. It should not take me more than a couple of hours.

ADD REPLYlink written 5.1 years ago by RamRS25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 859 users visited in the last hour