Question: How to create ouput file using perl
1
gravatar for vinij80
5.3 years ago by
vinij8030
United States
vinij8030 wrote:

I have two files.One file contains blast result of contigs and the next file contains the contigs that are used for blast.I want an output text file of sequences in a separate folder which gives no hits in blast result from original fasta file with all contigs.I have no idea about this.Can you please help me to create perl script for this problem.
 

blast sequence • 1.4k views
ADD COMMENTlink modified 5.3 years ago by iraun3.8k • written 5.3 years ago by vinij8030
2

Can you provide a short sample for each of the files?

ADD REPLYlink written 5.3 years ago by roy.granit830

BLAST RESULT FILE

CONTIG FILE

>Contig1
GAGCTAAATAATTTGAATCAATGGGAAGATCACCGTGTTGTGAAAAAGCACATACAAATA
AAGGAGCTTGGACTAAAGAAGAAGATGAACGACTTATTTCTTATATTAAAACTCACGGCG
AAGGTTGCTGGAGATCCCTTCCTAAAGCTGCCGGACTTCTCCGATGCGGTAAAAGTTGCC
GTCTCCGATGGATTAATTACTTGAGACCGGACCTTAAACGCGGTAATTTTACTGAAGAAG
AAGATGAACTCATTATCAAACTCCATAGCCTCCTTGGTAACAAATGGTCACTTATAGCCG
GAAGATTACCAGGAAGAACAGATAATGAGATAAAAAATTACTGGAATACGCACATAAGAA
GGAAGCTTTTGAGTCGGGGCATTGATCCAACGACACACAGGCCTGTTAACGAGCCTGGTA
CAACGCAAAAAGTCACAACAATTTCATTTGCAGGTGGAGATCATAAAACTAAAGATATTG
AAGAAGATCATAATAAGATGATAAATGTCAAAGCTGAATCTGGGTTGAGTCAATTAGAAG
ATGAAATTATTAGTAGCAGTCCATTTCGAGAACAGTGTCCTGATTTAAATCTTGAGCTCA
GAATTAGCCCTCCTTCTCTACAAAATTACCAACATAGCCCCTCAAGGTGTTTTGCATGCA
GTTTGGGTATACAAAATAGTAAAGATTGCAATTGCAGTAAAAATAATATTGCAAGTTATA
ACTTTTTAGGATTAAAGAGTAATGGTGTTTTGGACTATAGAACTTTAGAAACTAAGTGAA
TTTTTATTATAAATCTTTTTTTCCCTCGTGTATTTGGGTTAAAAAAACAAGAAGAGAGAA
TCGAGAAAGATATTCCTATTAGTTTAAGTTCTTTCGAATTTTCTCTTATTTGTAAAATTT
CAAGTATTACTATATACGATATATTATATTAAGTTGAAAAG
>Contig2
GCTCTTCCAACAACAACAACAATGCCTCATCAAAAGCCTCTTTCTCTCATTCTTCTATCT
ACACTCCCACTTCTTTTCATTCTCACACAAGCTCAATCACCAACAGCACCAGCACCAGCA
CCCTCAGGACCAATAGACATCTTTGCAATCCTCAAAAAAGAAGGACAATACAACACATTC
ATCAAGTTCCTAAATGAATCACAAGTTGGTAACCAAATCAACAACCAAGTAAACAACTCC
AACCAAGGCATGACAGTTTTGGCACCATCAGACAATGCATTTAACAACCTCCCAAGTGGT
ACACTCAACCAACTAAATGACCAACAAAAAGTACAACTCATTTTGAACCATGTCATACCA
AAGTTCTACACATTTGATGACTTACAAACAGTAAGCAACCCTGTTAGAACACAAGCAACA
GGGCCTAAAGGTGAGCCTTTTGGACTTAACTTTACTGGAAGTAACAATCAAGTGAATGTC
TCATCTGGTTCTGTTGTTACAAACATTTATAATGCTATTAGAAAAGACCCCCCATTGGCT
GTTTTTCAATTAGACAAAGTTTTAGTACCTTCTCAGTTTACTGATCCATCTAGTGATGAT
GATGCCCCTGCACCTACTAAACCCAAGAATGGTACTAGTAATGATAAAACAACAGCTGAT
GAGCCATCACCAGCAAGTAACACTAAGCCAAATGATGCTAAAAGGATCAGTGGTGGGATT
CTTGGATTGGTTTGTGGTGTTTTCTTGATGGCAACACTATCTTGAAGGGGGCTACAGAGT
TGTTAACTTTATGATCTTTTGCTTATACTAAGCCATTTTGTATTACATTGTTTTCTTCAA
GATTGATTGTTTTTGTTCAAAAAAGAAGGGGGGGGGGGAAAAAAAAACCCCCCTGCGGAA
AAGAGCGGGGAAAGCACCAAAAAGCCACCGACCAAAAGCACCAACTCACAAAAGGTGCGC
AGACGCGGAAAGGGGAAAAGGAAAAAATGTGAAAGCTTGTTATAGTTTG
>Contig3
AAACTGTAATTAGACTTCTCTGCTAAGTTTCTGCTGTATTTGGATTCTCCGGCGAACATT
AATATCTAACCATGACCGGCGGTGGAGGCGATGCCGCATCGCCGCCTCTATCCTCACAGT
CAACTCCATCCAACGGTGGGGAATTCCTTCTTCAATTGCTTCAGAATCATCCGCATCAAC
TTCACTCTCAGCCTCAACCGCCACTGCGGCCGGAGTTGCAGAATCTGCCGCATGATCCAG
CAGTTGCAGCAGTAGGTCCTAGTATGCCCTACCCGCCATTGTTCCATACTCCTACAAACC
CTTCTGTTTTGCCCTATTCTCACTCTCCTCCTCTGTTTGTACCTCATAACTTCTTCATTC
GAGGGTTTCTCCAAAACCCTAATTCTGGCCATACCACTAACCCCAATTACTCATCTCCGC
CTGCCCCAAGTGGGTTCAGTCAATATCACCATGCGAGTCCACTTGGATTTGGATCAGTCG
GAGAAAACATGGGCAATTTGGGGATTTTCGGTGCCAATGCTAAGGCGAG
>Contig4
CATGTAATAGCATAGCATCCCCAATTTCACCCTCTCATGGCCATGTCCACGCTCCTCTCC
CTGTCCGTGTCTATCCACCCACCAAAACCTTTGCAAAAACCCAATTCAATGTGTACCCAA
CCTAACTCTATTTCGAGAAGACAAGTGTTTTTCACTGGTTCTAATTTATTGCTCTCTCAA
TTAATTCCAAAATCCGACGCCCAAACCAATTCCAATAGTTTTCTTTCAGGTATTGCCAAT
ACTAAGTCTTGGTTCCAATTCTATGGCGACGGCTTTTCTATTCGTGTTCCACCGGAATTT
CAGGACCTCACTGAGCCGGAGGATTATAATGCTGGCCTATCACTATATGGAGATAAGGCT
AAGCCCAAAAAATTTGCAGCACGTTTTGCTTCTTCTGATGGATCCGAAGTTTTAAGTGTC
ATAATTCGTCCATCCAATCAGCTGAAGATCACTTTCTTAGAGGCTAAAGATATTACTGAT
TTAGGTTCACTTAAGGAGGCAGCAAAAATATTTGTTCCAGCTGGCTCAACACTATATTCT
GTCCGCACAATAAAAATTAAAGAAGATGAGGGTTTCAGGACATACTATTTTTATGAATTT
GTGAGAAATGAGCAACACGTTGCATTAGTGGCTGGTGTTAACAGTGGAAAGGCCGTCATT
GCTGGTGCCACGGCCCCCGAAAGCAAATGGGCCGAGGATGGTTTGAAGCTCCGATCTGCT
GCAGTATCAATGACAATTCTATAAGCAGAATGTGAGTATATATATAGGTTCTATTTCAAT
GATGATGAATTTATATACAAATATTGAGGATCAAAGTTTTCTTATTATCATCTAATCTCA
GCCAAGGATTAACAATCTCCATCATCCATTCAATAGCAATGTTTCTGCTGTTTTGC
ADD REPLYlink modified 11 months ago by RamRS30k • written 5.3 years ago by vinij8030

If the blast output is in tabular format, you can extract headers from your contig fasta file to headers.txt file, and simply use a grep -v -F -f headers.txt blast.tab to extract all contigs that are not present in blast file. But, as roy.granit has pointed out, it could be helpfull to provide an example of input and desired output files. If I understand well the problem, its not necessary to write a perl script, but of course it can be solved also with perl.

ADD REPLYlink written 5.3 years ago by iraun3.8k

Above mentioned are the blast result file and the contig file

ADD REPLYlink written 5.3 years ago by vinij8030

contig2 and contig 3  shows no blast hits so i need to extract these 2 contig sequences into a seperate text file

ADD REPLYlink written 5.3 years ago by vinij8030
3
gravatar for iraun
5.3 years ago by
iraun3.8k
Norway
iraun3.8k wrote:

1) Extract names of contigs without hit in blast:

grep -B5 "***** No hits found" blast.txt | grep Query | sed 's/Query= //g' > ids

2) Extract fasta records of those contigs without hit:

cat ids | xargs -n 1 samtools faidx contigs.fa > contigs_nohit.fa
ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by iraun3.8k

I am a beginner in perl. I did not understand what grep is, can you explain me how to run this script???

Can you write perl script for this??

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.3 years ago by vinij8030

This is not perl, is bash. You can open a linux terminal, write these two lines, and you will have desired output in contig_nohit.fa file. I would recommend you to read a little bit about basic bash commands: http://ss64.com/bash/.

ADD REPLYlink written 5.3 years ago by iraun3.8k

The first command is working..But by run the second command it shows that

[_razf_open] fail to open contigs.fa
[fai_build] fail to open the fasta file contigs.fa
[fai_load] fail to open FASTA index

Please give me the solution for this problem...

ADD REPLYlink modified 11 months ago by RamRS30k • written 5.3 years ago by vinij8030

What's the name of your file containing the contigs? I named it 'contigs.fa' as an example, but you should replace this name with your file name.

ADD REPLYlink written 5.3 years ago by iraun3.8k

I tried like that..but it shows that

[fai_load] build FASTA index
not found in FASTA file returning empty sequence
xargs: samtools: terminated by signal 11
ADD REPLYlink modified 11 months ago by RamRS30k • written 5.3 years ago by vinij8030

Thankyou Airan for your help...

ADD REPLYlink written 5.3 years ago by vinij8030
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1896 users visited in the last hour