Question: How to create ouput file using perl
1
gravatar for vinij80
4.0 years ago by
vinij8030
United States
vinij8030 wrote:

I have two files.One file contains blast result of contigs and the next file contains the contigs that are used for blast.I want an output text file of sequences in a separate folder which gives no hits in blast result from original fasta file with all contigs.I have no idea about this.Can you please help me to create perl script for this problem.
 

blast sequence • 1.2k views
ADD COMMENTlink modified 4.0 years ago by iraun3.6k • written 4.0 years ago by vinij8030
2

Can you provide a short sample for each of the files?

ADD REPLYlink written 4.0 years ago by roy.granit800

BLAST RESULT FILE

BLASTN 2.2.31+


Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.

 

Database: Arabidopsis_thaliana.mRNA.EST.fasta
           1,529,700 sequences; 400,627,814 total letters

 

Query= Contig1

Length=941
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

  gi|61656031|gb|DN604734.1|DN604734 EST JCAt1g22640 Arabidopsis ...  219     4e-055
 

> gi|61656031|gb|DN604734.1|DN604734 EST JCAt1g22640 Arabidopsis 
Gateway cDNA Library Arabidopsis thaliana cDNA 3', mRNA sequence.
Length=740

 Score = 219 bits (118),  Expect = 4e-055
 Identities = 263/334 (79%), Gaps = 6/334 (2%)
 Strand=Plus/Minus

Query  60   AAAGGAGCTTGGACTAAAGAAGAAGATGAACGACTTATTTCTTATAT--TAAAACTCACG  117
            ||||||||||||||||||||||||||| | |  ||| ||  ||| ||    |||  ||||
Sbjct  734  AAAGGAGCTTGGACTAAAGAAGAAGATCAGCTTCTTGTTGATTACATCCGTAAA--CACG  677

Query  118  GCGAAGGTTGCTGGAGATCCCTTCCTAAAGCTGCCGGACTTCTCCGATGCGGTAAAAGTT  177
            | |||||||||||| |||| || |||   || || ||| | |   |||| ||||| ||||
Sbjct  676  GTGAAGGTTGCTGGCGATCTCTCCCTCGCGCCGCTGGATTACAAAGATGTGGTAAGAGTT  617

Query  178  GCCGTCTCCGATGGATTAATTACTTGAGACCGGACCTTAAACGCGGTAATTTTACTGAAG  237
            |  |  |  ||||||| |||||  | ||||| || || ||| | || |||||||||||||
Sbjct  616  GTAGATTGAGATGGATGAATTATCTAAGACCAGATCTCAAAAGAGGCAATTTTACTGAAG  557

Query  238  AAGAAGATGAACTCATTATCAAACTCCATAGCCTCCTTGGTAACAAATGGTCACTTATAG  297
            |||||||||||||||| ||||| ||||||||| | || ||||||||||||||  | ||||
Sbjct  556  AAGAAGATGAACTCATCATCAAGCTCCATAGCTTGCTCGGTAACAAATGGTCTTTAATAG  497

Query  298  CCGGAAGATTACCAGGAAGAACAGATAATGAGATAAAAAATTACTGGAATACGCACAT-A  356
            | || ||||||||||||||||||||||| ||||| || || || ||||| || || || |
Sbjct  496  CTGGGAGATTACCAGGAAGAACAGATAACGAGATCAAGAACTATTGGAACACTCATATCA  437

Query  357  AGAAGGAAGCTTTTGAGTCGGGGCATTGATCCAA  390
            || ||||||||| | || || || ||||||||||
Sbjct  436  AG-AGGAAGCTTCTCAGCCGTGGGATTGATCCAA  404

Lambda      K        H
    1.33    0.621     1.12 

Gapped
Lambda      K        H
    1.28    0.460    0.850 

Effective search space used: 326667943382


Query= Contig2

Length=1009


***** No hits found *****

 

Lambda      K        H
    1.33    0.621     1.12 

Gapped
Lambda      K        H
    1.28    0.460    0.850 

Effective search space used: 350998085934


Query= Contig3

Length=529


***** No hits found *****

 

Lambda      K        H
    1.33    0.621     1.12 

Gapped
Lambda      K        H
    1.28    0.460    0.850 

Effective search space used: 180381608828


Query= Contig4

Length=896
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

  gi|152034016|gb|AU239642.2|AU239642 EST AU239642 RAFL21 Arabido...  167     1e-039

> gi|152034016|gb|AU239642.2|AU239642 EST AU239642 RAFL21 Arabidopsis 
thaliana cDNA clone RAFL21-11-C04 5', mRNA sequence.
Length=632

 Score = 167 bits (90),  Expect = 1e-039
 Identities = 260/345 (75%), Gaps = 2/345 (1%)
 Strand=Plus/Plus

Query  232  ATTGCCAATACTAAGTCTTGGTTCCAATTCTATGGCGACGGCTTTTCTATTCGTGTTCCA  291
            ||||| || ||||||||||||||||| | || |||    || ||  ||||| | ||||| 
Sbjct  286  ATTGCGAACACTAAGTCTTGGTTCCAGTACTTTGGTAGTGGGTTCGCTATTAGGGTTCCT  345

Query  292  CCGGAATTTCAGGACCTCACTGAGCCGGAGGATTATAATGCTGGCCTATCACTATATGGA  351
            || || ||| | ||| ||| |||||| ||||||||   ||| ||  | || || ||||| 
Sbjct  346  CCTGACTTTGAAGACGTCAATGAGCCTGAGGATTACTCTGCGGGATTGTCTCTCTATGGT  405

Query  352  GATAAGGCTAAGCCCAAAAAATTT-GCAGCACGTTTTGCTTCTTCTGATGGATCCGAAGT  410
            || ||||| ||||| | ||| ||| || || || ||     || |||||||||| |||||
Sbjct  406  GACAAGGCAAAGCC-ACAAACTTTCGCCGCCCGGTTCCAAACTCCTGATGGATCAGAAGT  464

Query  411  TTTAAGTGTCATAATTCGTCCATCCAATCAGCTGAAGATCACTTTCTTAGAGGCTAAAGA  470
            ||| |||||  | |||||||| || ||||| || ||||||||||||||||||||||||||
Sbjct  465  TTTGAGTGTAGTCATTCGTCCTTCAAATCAACTTAAGATCACTTTCTTAGAGGCTAAAGA  524

Query  471  TATTACTGATTTAGGTTCACTTAAGGAGGCAGCAAAAATATTTGTTCCAGCTGGCTCAAC  530
            |||  ||||||| || ||| | ||||  || |||| | | |||||||||| ||   ||||
Sbjct  525  TATATCTGATTTGGGATCATTGAAGGCAGCTGCAAGACTTTTTGTTCCAGGTGCNGCAAC  584

Query  531  ACTATATTCTGTCCGCACAATAAAAATTAAAGAAGATGAGGGTTT  575
            | | || ||||  || ||||| ||  | || ||||| || |||||
Sbjct  585  AATTTACTCTGCTCGTACAATCAAGGTAAAGGAAGAAGAAGGTTT  629

Lambda      K        H
    1.33    0.621     1.12 

Gapped
Lambda      K        H
    1.28    0.460    0.850 

Effective search space used: 310567113752

 

 

CONTIG FILE

>Contig1
GAGCTAAATAATTTGAATCAATGGGAAGATCACCGTGTTGTGAAAAAGCACATACAAATA
AAGGAGCTTGGACTAAAGAAGAAGATGAACGACTTATTTCTTATATTAAAACTCACGGCG
AAGGTTGCTGGAGATCCCTTCCTAAAGCTGCCGGACTTCTCCGATGCGGTAAAAGTTGCC
GTCTCCGATGGATTAATTACTTGAGACCGGACCTTAAACGCGGTAATTTTACTGAAGAAG
AAGATGAACTCATTATCAAACTCCATAGCCTCCTTGGTAACAAATGGTCACTTATAGCCG
GAAGATTACCAGGAAGAACAGATAATGAGATAAAAAATTACTGGAATACGCACATAAGAA
GGAAGCTTTTGAGTCGGGGCATTGATCCAACGACACACAGGCCTGTTAACGAGCCTGGTA
CAACGCAAAAAGTCACAACAATTTCATTTGCAGGTGGAGATCATAAAACTAAAGATATTG
AAGAAGATCATAATAAGATGATAAATGTCAAAGCTGAATCTGGGTTGAGTCAATTAGAAG
ATGAAATTATTAGTAGCAGTCCATTTCGAGAACAGTGTCCTGATTTAAATCTTGAGCTCA
GAATTAGCCCTCCTTCTCTACAAAATTACCAACATAGCCCCTCAAGGTGTTTTGCATGCA
GTTTGGGTATACAAAATAGTAAAGATTGCAATTGCAGTAAAAATAATATTGCAAGTTATA
ACTTTTTAGGATTAAAGAGTAATGGTGTTTTGGACTATAGAACTTTAGAAACTAAGTGAA
TTTTTATTATAAATCTTTTTTTCCCTCGTGTATTTGGGTTAAAAAAACAAGAAGAGAGAA
TCGAGAAAGATATTCCTATTAGTTTAAGTTCTTTCGAATTTTCTCTTATTTGTAAAATTT
CAAGTATTACTATATACGATATATTATATTAAGTTGAAAAG
>Contig2
GCTCTTCCAACAACAACAACAATGCCTCATCAAAAGCCTCTTTCTCTCATTCTTCTATCT
ACACTCCCACTTCTTTTCATTCTCACACAAGCTCAATCACCAACAGCACCAGCACCAGCA
CCCTCAGGACCAATAGACATCTTTGCAATCCTCAAAAAAGAAGGACAATACAACACATTC
ATCAAGTTCCTAAATGAATCACAAGTTGGTAACCAAATCAACAACCAAGTAAACAACTCC
AACCAAGGCATGACAGTTTTGGCACCATCAGACAATGCATTTAACAACCTCCCAAGTGGT
ACACTCAACCAACTAAATGACCAACAAAAAGTACAACTCATTTTGAACCATGTCATACCA
AAGTTCTACACATTTGATGACTTACAAACAGTAAGCAACCCTGTTAGAACACAAGCAACA
GGGCCTAAAGGTGAGCCTTTTGGACTTAACTTTACTGGAAGTAACAATCAAGTGAATGTC
TCATCTGGTTCTGTTGTTACAAACATTTATAATGCTATTAGAAAAGACCCCCCATTGGCT
GTTTTTCAATTAGACAAAGTTTTAGTACCTTCTCAGTTTACTGATCCATCTAGTGATGAT
GATGCCCCTGCACCTACTAAACCCAAGAATGGTACTAGTAATGATAAAACAACAGCTGAT
GAGCCATCACCAGCAAGTAACACTAAGCCAAATGATGCTAAAAGGATCAGTGGTGGGATT
CTTGGATTGGTTTGTGGTGTTTTCTTGATGGCAACACTATCTTGAAGGGGGCTACAGAGT
TGTTAACTTTATGATCTTTTGCTTATACTAAGCCATTTTGTATTACATTGTTTTCTTCAA
GATTGATTGTTTTTGTTCAAAAAAGAAGGGGGGGGGGGAAAAAAAAACCCCCCTGCGGAA
AAGAGCGGGGAAAGCACCAAAAAGCCACCGACCAAAAGCACCAACTCACAAAAGGTGCGC
AGACGCGGAAAGGGGAAAAGGAAAAAATGTGAAAGCTTGTTATAGTTTG
>Contig3
AAACTGTAATTAGACTTCTCTGCTAAGTTTCTGCTGTATTTGGATTCTCCGGCGAACATT
AATATCTAACCATGACCGGCGGTGGAGGCGATGCCGCATCGCCGCCTCTATCCTCACAGT
CAACTCCATCCAACGGTGGGGAATTCCTTCTTCAATTGCTTCAGAATCATCCGCATCAAC
TTCACTCTCAGCCTCAACCGCCACTGCGGCCGGAGTTGCAGAATCTGCCGCATGATCCAG
CAGTTGCAGCAGTAGGTCCTAGTATGCCCTACCCGCCATTGTTCCATACTCCTACAAACC
CTTCTGTTTTGCCCTATTCTCACTCTCCTCCTCTGTTTGTACCTCATAACTTCTTCATTC
GAGGGTTTCTCCAAAACCCTAATTCTGGCCATACCACTAACCCCAATTACTCATCTCCGC
CTGCCCCAAGTGGGTTCAGTCAATATCACCATGCGAGTCCACTTGGATTTGGATCAGTCG
GAGAAAACATGGGCAATTTGGGGATTTTCGGTGCCAATGCTAAGGCGAG
>Contig4
CATGTAATAGCATAGCATCCCCAATTTCACCCTCTCATGGCCATGTCCACGCTCCTCTCC
CTGTCCGTGTCTATCCACCCACCAAAACCTTTGCAAAAACCCAATTCAATGTGTACCCAA
CCTAACTCTATTTCGAGAAGACAAGTGTTTTTCACTGGTTCTAATTTATTGCTCTCTCAA
TTAATTCCAAAATCCGACGCCCAAACCAATTCCAATAGTTTTCTTTCAGGTATTGCCAAT
ACTAAGTCTTGGTTCCAATTCTATGGCGACGGCTTTTCTATTCGTGTTCCACCGGAATTT
CAGGACCTCACTGAGCCGGAGGATTATAATGCTGGCCTATCACTATATGGAGATAAGGCT
AAGCCCAAAAAATTTGCAGCACGTTTTGCTTCTTCTGATGGATCCGAAGTTTTAAGTGTC
ATAATTCGTCCATCCAATCAGCTGAAGATCACTTTCTTAGAGGCTAAAGATATTACTGAT
TTAGGTTCACTTAAGGAGGCAGCAAAAATATTTGTTCCAGCTGGCTCAACACTATATTCT
GTCCGCACAATAAAAATTAAAGAAGATGAGGGTTTCAGGACATACTATTTTTATGAATTT
GTGAGAAATGAGCAACACGTTGCATTAGTGGCTGGTGTTAACAGTGGAAAGGCCGTCATT
GCTGGTGCCACGGCCCCCGAAAGCAAATGGGCCGAGGATGGTTTGAAGCTCCGATCTGCT
GCAGTATCAATGACAATTCTATAAGCAGAATGTGAGTATATATATAGGTTCTATTTCAAT
GATGATGAATTTATATACAAATATTGAGGATCAAAGTTTTCTTATTATCATCTAATCTCA
GCCAAGGATTAACAATCTCCATCATCCATTCAATAGCAATGTTTCTGCTGTTTTGC

 

ADD REPLYlink written 4.0 years ago by vinij8030

If the blast output is in tabular format, you can extract headers from your contig fasta file to headers.txt file, and simply use a grep -v -F -f headers.txt blast.tab to extract all contigs that are not present in blast file. But, as roy.granit has pointed out, it could be helpfull to provide an example of input and desired output files. If I understand well the problem, its not necessary to write a perl script, but of course it can be solved also with perl.

ADD REPLYlink written 4.0 years ago by iraun3.6k

Above mentioned are the blast result file and the contig file

ADD REPLYlink written 4.0 years ago by vinij8030

contig2 and contig 3  shows no blast hits so i need to extract these 2 contig sequences into a seperate text file

ADD REPLYlink written 4.0 years ago by vinij8030
3
gravatar for iraun
4.0 years ago by
iraun3.6k
Norway
iraun3.6k wrote:

1) Extract names of contigs without hit in blast:

grep -B5 "***** No hits found" blast.txt | grep Query | sed 's/Query= //g' > ids

2) Extract fasta records of those contigs without hit:

cat ids | xargs -n 1 samtools faidx contigs.fa > contigs_nohit.fa
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by iraun3.6k

I am a beginner in perl .i did not understand what grep is, can you explain me how to run this script???

 

Can you write perl script for this??

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by vinij8030

This is not perl, is bash. You can open a linux terminal, write these two lines, and you will have desired output in contig_nohit.fa file. I would recommend you to read a little bit about basic bash commands: http://ss64.com/bash/.

ADD REPLYlink written 4.0 years ago by iraun3.6k

The first command is working..But by run the second command it shows that

[_razf_open] fail to open contigs.fa

[fai_build] fail to open the fasta file contigs.fa

[fai_load] fail to open FASTA index

Please give me the solution for this problem...

ADD REPLYlink written 4.0 years ago by vinij8030

What's the name of your file containing the contigs? I named it 'contigs.fa' as an example, but you should replace this name with your file name.

ADD REPLYlink written 4.0 years ago by iraun3.6k

I tried like that..but it shows that

[fai_load] build FASTA index

not found in FASTA file returning empty sequence

xargs: samtools: terminated by signal 11

ADD REPLYlink written 4.0 years ago by vinij8030

Thankyou Airan for your help...

ADD REPLYlink written 4.0 years ago by vinij8030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 829 users visited in the last hour