How to create ouput file using perl
1
1
Entering edit mode
8.8 years ago
vinij80 ▴ 30

I have two files.One file contains blast result of contigs and the next file contains the contigs that are used for blast.I want an output text file of sequences in a separate folder which gives no hits in blast result from original fasta file with all contigs. I have no idea about this.Can you please help me to create perl script for this problem.

blast sequence • 2.5k views
ADD COMMENT
2
Entering edit mode

Can you provide a short sample for each of the files?

ADD REPLY
0
Entering edit mode

BLAST RESULT FILE

CONTIG FILE

>Contig1
GAGCTAAATAATTTGAATCAATGGGAAGATCACCGTGTTGTGAAAAAGCACATACAAATA
AAGGAGCTTGGACTAAAGAAGAAGATGAACGACTTATTTCTTATATTAAAACTCACGGCG
AAGGTTGCTGGAGATCCCTTCCTAAAGCTGCCGGACTTCTCCGATGCGGTAAAAGTTGCC
GTCTCCGATGGATTAATTACTTGAGACCGGACCTTAAACGCGGTAATTTTACTGAAGAAG
AAGATGAACTCATTATCAAACTCCATAGCCTCCTTGGTAACAAATGGTCACTTATAGCCG
GAAGATTACCAGGAAGAACAGATAATGAGATAAAAAATTACTGGAATACGCACATAAGAA
GGAAGCTTTTGAGTCGGGGCATTGATCCAACGACACACAGGCCTGTTAACGAGCCTGGTA
CAACGCAAAAAGTCACAACAATTTCATTTGCAGGTGGAGATCATAAAACTAAAGATATTG
AAGAAGATCATAATAAGATGATAAATGTCAAAGCTGAATCTGGGTTGAGTCAATTAGAAG
ATGAAATTATTAGTAGCAGTCCATTTCGAGAACAGTGTCCTGATTTAAATCTTGAGCTCA
GAATTAGCCCTCCTTCTCTACAAAATTACCAACATAGCCCCTCAAGGTGTTTTGCATGCA
GTTTGGGTATACAAAATAGTAAAGATTGCAATTGCAGTAAAAATAATATTGCAAGTTATA
ACTTTTTAGGATTAAAGAGTAATGGTGTTTTGGACTATAGAACTTTAGAAACTAAGTGAA
TTTTTATTATAAATCTTTTTTTCCCTCGTGTATTTGGGTTAAAAAAACAAGAAGAGAGAA
TCGAGAAAGATATTCCTATTAGTTTAAGTTCTTTCGAATTTTCTCTTATTTGTAAAATTT
CAAGTATTACTATATACGATATATTATATTAAGTTGAAAAG
>Contig2
GCTCTTCCAACAACAACAACAATGCCTCATCAAAAGCCTCTTTCTCTCATTCTTCTATCT
ACACTCCCACTTCTTTTCATTCTCACACAAGCTCAATCACCAACAGCACCAGCACCAGCA
CCCTCAGGACCAATAGACATCTTTGCAATCCTCAAAAAAGAAGGACAATACAACACATTC
ATCAAGTTCCTAAATGAATCACAAGTTGGTAACCAAATCAACAACCAAGTAAACAACTCC
AACCAAGGCATGACAGTTTTGGCACCATCAGACAATGCATTTAACAACCTCCCAAGTGGT
ACACTCAACCAACTAAATGACCAACAAAAAGTACAACTCATTTTGAACCATGTCATACCA
AAGTTCTACACATTTGATGACTTACAAACAGTAAGCAACCCTGTTAGAACACAAGCAACA
GGGCCTAAAGGTGAGCCTTTTGGACTTAACTTTACTGGAAGTAACAATCAAGTGAATGTC
TCATCTGGTTCTGTTGTTACAAACATTTATAATGCTATTAGAAAAGACCCCCCATTGGCT
GTTTTTCAATTAGACAAAGTTTTAGTACCTTCTCAGTTTACTGATCCATCTAGTGATGAT
GATGCCCCTGCACCTACTAAACCCAAGAATGGTACTAGTAATGATAAAACAACAGCTGAT
GAGCCATCACCAGCAAGTAACACTAAGCCAAATGATGCTAAAAGGATCAGTGGTGGGATT
CTTGGATTGGTTTGTGGTGTTTTCTTGATGGCAACACTATCTTGAAGGGGGCTACAGAGT
TGTTAACTTTATGATCTTTTGCTTATACTAAGCCATTTTGTATTACATTGTTTTCTTCAA
GATTGATTGTTTTTGTTCAAAAAAGAAGGGGGGGGGGGAAAAAAAAACCCCCCTGCGGAA
AAGAGCGGGGAAAGCACCAAAAAGCCACCGACCAAAAGCACCAACTCACAAAAGGTGCGC
AGACGCGGAAAGGGGAAAAGGAAAAAATGTGAAAGCTTGTTATAGTTTG
>Contig3
AAACTGTAATTAGACTTCTCTGCTAAGTTTCTGCTGTATTTGGATTCTCCGGCGAACATT
AATATCTAACCATGACCGGCGGTGGAGGCGATGCCGCATCGCCGCCTCTATCCTCACAGT
CAACTCCATCCAACGGTGGGGAATTCCTTCTTCAATTGCTTCAGAATCATCCGCATCAAC
TTCACTCTCAGCCTCAACCGCCACTGCGGCCGGAGTTGCAGAATCTGCCGCATGATCCAG
CAGTTGCAGCAGTAGGTCCTAGTATGCCCTACCCGCCATTGTTCCATACTCCTACAAACC
CTTCTGTTTTGCCCTATTCTCACTCTCCTCCTCTGTTTGTACCTCATAACTTCTTCATTC
GAGGGTTTCTCCAAAACCCTAATTCTGGCCATACCACTAACCCCAATTACTCATCTCCGC
CTGCCCCAAGTGGGTTCAGTCAATATCACCATGCGAGTCCACTTGGATTTGGATCAGTCG
GAGAAAACATGGGCAATTTGGGGATTTTCGGTGCCAATGCTAAGGCGAG
>Contig4
CATGTAATAGCATAGCATCCCCAATTTCACCCTCTCATGGCCATGTCCACGCTCCTCTCC
CTGTCCGTGTCTATCCACCCACCAAAACCTTTGCAAAAACCCAATTCAATGTGTACCCAA
CCTAACTCTATTTCGAGAAGACAAGTGTTTTTCACTGGTTCTAATTTATTGCTCTCTCAA
TTAATTCCAAAATCCGACGCCCAAACCAATTCCAATAGTTTTCTTTCAGGTATTGCCAAT
ACTAAGTCTTGGTTCCAATTCTATGGCGACGGCTTTTCTATTCGTGTTCCACCGGAATTT
CAGGACCTCACTGAGCCGGAGGATTATAATGCTGGCCTATCACTATATGGAGATAAGGCT
AAGCCCAAAAAATTTGCAGCACGTTTTGCTTCTTCTGATGGATCCGAAGTTTTAAGTGTC
ATAATTCGTCCATCCAATCAGCTGAAGATCACTTTCTTAGAGGCTAAAGATATTACTGAT
TTAGGTTCACTTAAGGAGGCAGCAAAAATATTTGTTCCAGCTGGCTCAACACTATATTCT
GTCCGCACAATAAAAATTAAAGAAGATGAGGGTTTCAGGACATACTATTTTTATGAATTT
GTGAGAAATGAGCAACACGTTGCATTAGTGGCTGGTGTTAACAGTGGAAAGGCCGTCATT
GCTGGTGCCACGGCCCCCGAAAGCAAATGGGCCGAGGATGGTTTGAAGCTCCGATCTGCT
GCAGTATCAATGACAATTCTATAAGCAGAATGTGAGTATATATATAGGTTCTATTTCAAT
GATGATGAATTTATATACAAATATTGAGGATCAAAGTTTTCTTATTATCATCTAATCTCA
GCCAAGGATTAACAATCTCCATCATCCATTCAATAGCAATGTTTCTGCTGTTTTGC
ADD REPLY
0
Entering edit mode

If the blast output is in tabular format, you can extract headers from your contig fasta file to headers.txt file, and simply use a grep -v -F -f headers.txt blast.tab to extract all contigs that are not present in blast file. But, as roy.granit has pointed out, it could be helpful to provide an example of input and desired output files. If I understand well the problem, its not necessary to write a perl script, but of course it can be solved also with perl.

ADD REPLY
0
Entering edit mode

Above mentioned are the blast result file and the contig file

ADD REPLY
0
Entering edit mode

contig2 and contig 3 show no blast hits so I need to extract these 2 contig sequences into a separate text file

ADD REPLY
3
Entering edit mode
8.8 years ago
iraun 6.2k

1) Extract names of contigs without hit in blast:

grep -B5 "***** No hits found" blast.txt | grep Query | sed 's/Query= //g' > ids

2) Extract fasta records of those contigs without hit:

cat ids | xargs -n 1 samtools faidx contigs.fa > contigs_nohit.fa
ADD COMMENT
0
Entering edit mode

I am a beginner in perl. I did not understand what grep is, can you explain me how to run this script???

Can you write perl script for this??

ADD REPLY
0
Entering edit mode

This is not perl, is bash. You can open a linux terminal, write these two lines, and you will have desired output in contig_nohit.fa file. I would recommend you to read a little bit about basic bash commands: http://ss64.com/bash/.

ADD REPLY
0
Entering edit mode

The first command is working..But by run the second command it shows that

[_razf_open] fail to open contigs.fa
[fai_build] fail to open the fasta file contigs.fa
[fai_load] fail to open FASTA index

Please give me the solution for this problem...

ADD REPLY
0
Entering edit mode

What's the name of your file containing the contigs? I named it 'contigs.fa' as an example, but you should replace this name with your file name.

ADD REPLY
0
Entering edit mode

I tried like that..but it shows that

[fai_load] build FASTA index
not found in FASTA file returning empty sequence
xargs: samtools: terminated by signal 11
ADD REPLY
0
Entering edit mode

Thankyou Airan for your help...

ADD REPLY

Login before adding your answer.

Traffic: 2116 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6