How to create ouput file using perl
1
1
Entering edit mode
9.9 years ago
vinij80 ▴ 30

I have two files.One file contains blast result of contigs and the next file contains the contigs that are used for blast.I want an output text file of sequences in a separate folder which gives no hits in blast result from original fasta file with all contigs. I have no idea about this.Can you please help me to create perl script for this problem.

blast sequence • 3.1k views
ADD COMMENT
2
Entering edit mode

Can you provide a short sample for each of the files?

ADD REPLY
0
Entering edit mode

BLAST RESULT FILE

BLASTN 2.2.31+
Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
Miller (2000), "A greedy algorithm for aligning DNA sequences", J
Comput Biol 2000; 7(1-2):203-14.
Database: Arabidopsis_thaliana.mRNA.EST.fasta
1,529,700 sequences; 400,627,814 total letters
Query= Contig1
Length=941
Score E
Sequences producing significant alignments: (Bits) Value
gi|61656031|gb|DN604734.1|DN604734 EST JCAt1g22640 Arabidopsis ... 219 4e-055
> gi|61656031|gb|DN604734.1|DN604734 EST JCAt1g22640 Arabidopsis
Gateway cDNA Library Arabidopsis thaliana cDNA 3', mRNA sequence.
Length=740
Score = 219 bits (118), Expect = 4e-055
Identities = 263/334 (79%), Gaps = 6/334 (2%)
Strand=Plus/Minus
Query 60 AAAGGAGCTTGGACTAAAGAAGAAGATGAACGACTTATTTCTTATAT--TAAAACTCACG 117
||||||||||||||||||||||||||| | | ||| || ||| || ||| ||||
Sbjct 734 AAAGGAGCTTGGACTAAAGAAGAAGATCAGCTTCTTGTTGATTACATCCGTAAA--CACG 677
Query 118 GCGAAGGTTGCTGGAGATCCCTTCCTAAAGCTGCCGGACTTCTCCGATGCGGTAAAAGTT 177
| |||||||||||| |||| || ||| || || ||| | | |||| ||||| ||||
Sbjct 676 GTGAAGGTTGCTGGCGATCTCTCCCTCGCGCCGCTGGATTACAAAGATGTGGTAAGAGTT 617
Query 178 GCCGTCTCCGATGGATTAATTACTTGAGACCGGACCTTAAACGCGGTAATTTTACTGAAG 237
| | | ||||||| ||||| | ||||| || || ||| | || |||||||||||||
Sbjct 616 GTAGATTGAGATGGATGAATTATCTAAGACCAGATCTCAAAAGAGGCAATTTTACTGAAG 557
Query 238 AAGAAGATGAACTCATTATCAAACTCCATAGCCTCCTTGGTAACAAATGGTCACTTATAG 297
|||||||||||||||| ||||| ||||||||| | || |||||||||||||| | ||||
Sbjct 556 AAGAAGATGAACTCATCATCAAGCTCCATAGCTTGCTCGGTAACAAATGGTCTTTAATAG 497
Query 298 CCGGAAGATTACCAGGAAGAACAGATAATGAGATAAAAAATTACTGGAATACGCACAT-A 356
| || ||||||||||||||||||||||| ||||| || || || ||||| || || || |
Sbjct 496 CTGGGAGATTACCAGGAAGAACAGATAACGAGATCAAGAACTATTGGAACACTCATATCA 437
Query 357 AGAAGGAAGCTTTTGAGTCGGGGCATTGATCCAA 390
|| ||||||||| | || || || ||||||||||
Sbjct 436 AG-AGGAAGCTTCTCAGCCGTGGGATTGATCCAA 404
Lambda K H
1.33 0.621 1.12
Gapped
Lambda K H
1.28 0.460 0.850
Effective search space used: 326667943382
Query= Contig2
Length=1009
***** No hits found *****
Lambda K H
1.33 0.621 1.12
Gapped
Lambda K H
1.28 0.460 0.850
Effective search space used: 350998085934
Query= Contig3
Length=529
***** No hits found *****
Lambda K H
1.33 0.621 1.12
Gapped
Lambda K H
1.28 0.460 0.850
Effective search space used: 180381608828
Query= Contig4
Length=896
Score E
Sequences producing significant alignments: (Bits) Value
gi|152034016|gb|AU239642.2|AU239642 EST AU239642 RAFL21 Arabido... 167 1e-039
> gi|152034016|gb|AU239642.2|AU239642 EST AU239642 RAFL21 Arabidopsis
thaliana cDNA clone RAFL21-11-C04 5', mRNA sequence.
Length=632
Score = 167 bits (90), Expect = 1e-039
Identities = 260/345 (75%), Gaps = 2/345 (1%)
Strand=Plus/Plus
Query 232 ATTGCCAATACTAAGTCTTGGTTCCAATTCTATGGCGACGGCTTTTCTATTCGTGTTCCA 291
||||| || ||||||||||||||||| | || ||| || || ||||| | |||||
Sbjct 286 ATTGCGAACACTAAGTCTTGGTTCCAGTACTTTGGTAGTGGGTTCGCTATTAGGGTTCCT 345
Query 292 CCGGAATTTCAGGACCTCACTGAGCCGGAGGATTATAATGCTGGCCTATCACTATATGGA 351
|| || ||| | ||| ||| |||||| |||||||| ||| || | || || |||||
Sbjct 346 CCTGACTTTGAAGACGTCAATGAGCCTGAGGATTACTCTGCGGGATTGTCTCTCTATGGT 405
Query 352 GATAAGGCTAAGCCCAAAAAATTT-GCAGCACGTTTTGCTTCTTCTGATGGATCCGAAGT 410
|| ||||| ||||| | ||| ||| || || || || || |||||||||| |||||
Sbjct 406 GACAAGGCAAAGCC-ACAAACTTTCGCCGCCCGGTTCCAAACTCCTGATGGATCAGAAGT 464
Query 411 TTTAAGTGTCATAATTCGTCCATCCAATCAGCTGAAGATCACTTTCTTAGAGGCTAAAGA 470
||| ||||| | |||||||| || ||||| || ||||||||||||||||||||||||||
Sbjct 465 TTTGAGTGTAGTCATTCGTCCTTCAAATCAACTTAAGATCACTTTCTTAGAGGCTAAAGA 524
Query 471 TATTACTGATTTAGGTTCACTTAAGGAGGCAGCAAAAATATTTGTTCCAGCTGGCTCAAC 530
||| ||||||| || ||| | |||| || |||| | | |||||||||| || ||||
Sbjct 525 TATATCTGATTTGGGATCATTGAAGGCAGCTGCAAGACTTTTTGTTCCAGGTGCNGCAAC 584
Query 531 ACTATATTCTGTCCGCACAATAAAAATTAAAGAAGATGAGGGTTT 575
| | || |||| || ||||| || | || ||||| || |||||
Sbjct 585 AATTTACTCTGCTCGTACAATCAAGGTAAAGGAAGAAGAAGGTTT 629
Lambda K H
1.33 0.621 1.12
Gapped
Lambda K H
1.28 0.460 0.850
Effective search space used: 310567113752

CONTIG FILE

>Contig1
GAGCTAAATAATTTGAATCAATGGGAAGATCACCGTGTTGTGAAAAAGCACATACAAATA
AAGGAGCTTGGACTAAAGAAGAAGATGAACGACTTATTTCTTATATTAAAACTCACGGCG
AAGGTTGCTGGAGATCCCTTCCTAAAGCTGCCGGACTTCTCCGATGCGGTAAAAGTTGCC
GTCTCCGATGGATTAATTACTTGAGACCGGACCTTAAACGCGGTAATTTTACTGAAGAAG
AAGATGAACTCATTATCAAACTCCATAGCCTCCTTGGTAACAAATGGTCACTTATAGCCG
GAAGATTACCAGGAAGAACAGATAATGAGATAAAAAATTACTGGAATACGCACATAAGAA
GGAAGCTTTTGAGTCGGGGCATTGATCCAACGACACACAGGCCTGTTAACGAGCCTGGTA
CAACGCAAAAAGTCACAACAATTTCATTTGCAGGTGGAGATCATAAAACTAAAGATATTG
AAGAAGATCATAATAAGATGATAAATGTCAAAGCTGAATCTGGGTTGAGTCAATTAGAAG
ATGAAATTATTAGTAGCAGTCCATTTCGAGAACAGTGTCCTGATTTAAATCTTGAGCTCA
GAATTAGCCCTCCTTCTCTACAAAATTACCAACATAGCCCCTCAAGGTGTTTTGCATGCA
GTTTGGGTATACAAAATAGTAAAGATTGCAATTGCAGTAAAAATAATATTGCAAGTTATA
ACTTTTTAGGATTAAAGAGTAATGGTGTTTTGGACTATAGAACTTTAGAAACTAAGTGAA
TTTTTATTATAAATCTTTTTTTCCCTCGTGTATTTGGGTTAAAAAAACAAGAAGAGAGAA
TCGAGAAAGATATTCCTATTAGTTTAAGTTCTTTCGAATTTTCTCTTATTTGTAAAATTT
CAAGTATTACTATATACGATATATTATATTAAGTTGAAAAG
>Contig2
GCTCTTCCAACAACAACAACAATGCCTCATCAAAAGCCTCTTTCTCTCATTCTTCTATCT
ACACTCCCACTTCTTTTCATTCTCACACAAGCTCAATCACCAACAGCACCAGCACCAGCA
CCCTCAGGACCAATAGACATCTTTGCAATCCTCAAAAAAGAAGGACAATACAACACATTC
ATCAAGTTCCTAAATGAATCACAAGTTGGTAACCAAATCAACAACCAAGTAAACAACTCC
AACCAAGGCATGACAGTTTTGGCACCATCAGACAATGCATTTAACAACCTCCCAAGTGGT
ACACTCAACCAACTAAATGACCAACAAAAAGTACAACTCATTTTGAACCATGTCATACCA
AAGTTCTACACATTTGATGACTTACAAACAGTAAGCAACCCTGTTAGAACACAAGCAACA
GGGCCTAAAGGTGAGCCTTTTGGACTTAACTTTACTGGAAGTAACAATCAAGTGAATGTC
TCATCTGGTTCTGTTGTTACAAACATTTATAATGCTATTAGAAAAGACCCCCCATTGGCT
GTTTTTCAATTAGACAAAGTTTTAGTACCTTCTCAGTTTACTGATCCATCTAGTGATGAT
GATGCCCCTGCACCTACTAAACCCAAGAATGGTACTAGTAATGATAAAACAACAGCTGAT
GAGCCATCACCAGCAAGTAACACTAAGCCAAATGATGCTAAAAGGATCAGTGGTGGGATT
CTTGGATTGGTTTGTGGTGTTTTCTTGATGGCAACACTATCTTGAAGGGGGCTACAGAGT
TGTTAACTTTATGATCTTTTGCTTATACTAAGCCATTTTGTATTACATTGTTTTCTTCAA
GATTGATTGTTTTTGTTCAAAAAAGAAGGGGGGGGGGGAAAAAAAAACCCCCCTGCGGAA
AAGAGCGGGGAAAGCACCAAAAAGCCACCGACCAAAAGCACCAACTCACAAAAGGTGCGC
AGACGCGGAAAGGGGAAAAGGAAAAAATGTGAAAGCTTGTTATAGTTTG
>Contig3
AAACTGTAATTAGACTTCTCTGCTAAGTTTCTGCTGTATTTGGATTCTCCGGCGAACATT
AATATCTAACCATGACCGGCGGTGGAGGCGATGCCGCATCGCCGCCTCTATCCTCACAGT
CAACTCCATCCAACGGTGGGGAATTCCTTCTTCAATTGCTTCAGAATCATCCGCATCAAC
TTCACTCTCAGCCTCAACCGCCACTGCGGCCGGAGTTGCAGAATCTGCCGCATGATCCAG
CAGTTGCAGCAGTAGGTCCTAGTATGCCCTACCCGCCATTGTTCCATACTCCTACAAACC
CTTCTGTTTTGCCCTATTCTCACTCTCCTCCTCTGTTTGTACCTCATAACTTCTTCATTC
GAGGGTTTCTCCAAAACCCTAATTCTGGCCATACCACTAACCCCAATTACTCATCTCCGC
CTGCCCCAAGTGGGTTCAGTCAATATCACCATGCGAGTCCACTTGGATTTGGATCAGTCG
GAGAAAACATGGGCAATTTGGGGATTTTCGGTGCCAATGCTAAGGCGAG
>Contig4
CATGTAATAGCATAGCATCCCCAATTTCACCCTCTCATGGCCATGTCCACGCTCCTCTCC
CTGTCCGTGTCTATCCACCCACCAAAACCTTTGCAAAAACCCAATTCAATGTGTACCCAA
CCTAACTCTATTTCGAGAAGACAAGTGTTTTTCACTGGTTCTAATTTATTGCTCTCTCAA
TTAATTCCAAAATCCGACGCCCAAACCAATTCCAATAGTTTTCTTTCAGGTATTGCCAAT
ACTAAGTCTTGGTTCCAATTCTATGGCGACGGCTTTTCTATTCGTGTTCCACCGGAATTT
CAGGACCTCACTGAGCCGGAGGATTATAATGCTGGCCTATCACTATATGGAGATAAGGCT
AAGCCCAAAAAATTTGCAGCACGTTTTGCTTCTTCTGATGGATCCGAAGTTTTAAGTGTC
ATAATTCGTCCATCCAATCAGCTGAAGATCACTTTCTTAGAGGCTAAAGATATTACTGAT
TTAGGTTCACTTAAGGAGGCAGCAAAAATATTTGTTCCAGCTGGCTCAACACTATATTCT
GTCCGCACAATAAAAATTAAAGAAGATGAGGGTTTCAGGACATACTATTTTTATGAATTT
GTGAGAAATGAGCAACACGTTGCATTAGTGGCTGGTGTTAACAGTGGAAAGGCCGTCATT
GCTGGTGCCACGGCCCCCGAAAGCAAATGGGCCGAGGATGGTTTGAAGCTCCGATCTGCT
GCAGTATCAATGACAATTCTATAAGCAGAATGTGAGTATATATATAGGTTCTATTTCAAT
GATGATGAATTTATATACAAATATTGAGGATCAAAGTTTTCTTATTATCATCTAATCTCA
GCCAAGGATTAACAATCTCCATCATCCATTCAATAGCAATGTTTCTGCTGTTTTGC
ADD REPLY
0
Entering edit mode

If the blast output is in tabular format, you can extract headers from your contig fasta file to headers.txt file, and simply use a grep -v -F -f headers.txt blast.tab to extract all contigs that are not present in blast file. But, as roy.granit has pointed out, it could be helpful to provide an example of input and desired output files. If I understand well the problem, its not necessary to write a perl script, but of course it can be solved also with perl.

ADD REPLY
0
Entering edit mode

Above mentioned are the blast result file and the contig file

ADD REPLY
0
Entering edit mode

contig2 and contig 3 show no blast hits so I need to extract these 2 contig sequences into a separate text file

ADD REPLY
3
Entering edit mode
9.9 years ago
iraun 6.2k

1) Extract names of contigs without hit in blast:

grep -B5 "***** No hits found" blast.txt | grep Query | sed 's/Query= //g' > ids

2) Extract fasta records of those contigs without hit:

cat ids | xargs -n 1 samtools faidx contigs.fa > contigs_nohit.fa
ADD COMMENT
0
Entering edit mode

I am a beginner in perl. I did not understand what grep is, can you explain me how to run this script???

Can you write perl script for this??

ADD REPLY
0
Entering edit mode

This is not perl, is bash. You can open a linux terminal, write these two lines, and you will have desired output in contig_nohit.fa file. I would recommend you to read a little bit about basic bash commands: http://ss64.com/bash/.

ADD REPLY
0
Entering edit mode

The first command is working..But by run the second command it shows that

[_razf_open] fail to open contigs.fa
[fai_build] fail to open the fasta file contigs.fa
[fai_load] fail to open FASTA index

Please give me the solution for this problem...

ADD REPLY
0
Entering edit mode

What's the name of your file containing the contigs? I named it 'contigs.fa' as an example, but you should replace this name with your file name.

ADD REPLY
0
Entering edit mode

I tried like that..but it shows that

[fai_load] build FASTA index
not found in FASTA file returning empty sequence
xargs: samtools: terminated by signal 11
ADD REPLY
0
Entering edit mode

Thankyou Airan for your help...

ADD REPLY

Login before adding your answer.

Traffic: 2531 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6