Question: fastq to fasta perl
1
gravatar for cabraham03
3.8 years ago by
cabraham0320
Mexico
cabraham0320 wrote:

Hi, I have the next code to convert from fastq to fasta, however I have some problems to be fix. 

 

 I want from : 

@YOX24:00004:00021
AATCAGATAGTCAGCGAAGCGATTCGCCGCGCCATGACCATTGAATGGGCTTTTATCTTGATCGAAGCACCTGTTTGTAACAAAAAGGGTTGGCCCTTCGAGTTTGATCTCTTTCGATGAGAACAAGTCGACGTTGCTGGATCATCTGGCGTTAGCAATGTGGTGATGATCGCGTTCTTCAAGTTGTTTTGTGGTTCGGTCGGAGAA
+
929994988483448887444665///,/222*//0//,//,/8,//754444,4:492/;<;11,1<<?29449144;:44444+4;-4249;;-424//7775/-,0,,/65684444489;A@B>884444244448994444442449999=967:?A<@BBBBCA?@A11,1-+-,2,,*/8885;;C>B>=8884424333
@YOX24:00004:00026
CCTAGGCATTACTCACCCGTCCGCCGCTCGACGCCGTTATCGTCCCCCGAAGGTTCAGTTAACTCGTTTCCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTTCAATTTAAGATTTTGTTCGGCTCAATGAATACTGAACATTACATAAAGTAATGTTTGAATTGACTGTGCTGAGTCCGAAGACTCAAT
+
;;111,/18137>>AA?:?9959959>?AA@99959A?F?@AACCCC3AA=A>@=@B;;8<?@AAA@@<@>AAAA>>9;599B;<<BCA<<;=@<9<=@>>A;;<@@BFCCEEC;<;>A@>@@@BC>CBCD=?@8;:/8=?<;??2<<;??:?BB>;=@=:000>;<:A00,0;;;AC>CBB>@?;;6;>;B5::>??????A@888<;;9=@@@BB>A
@YOX24:00004:00027
TTTGAAATCGATGACATCGAGTACTCGGTTCGATTGGTTTACGAAGCACGTTACCAAAAAGAAGGCGACATGAGCCTTGTGCTGCACAGCGCTGAAGACGGCAACTTCTACACACTTCGTTTACCGTTGGTCATGTAGACACTGGGCGCTGCATTATGATAGGTGGCCTACAAGGCCCTCGAAGCAGTGAAGAAAACAACGCAAAATCAAAAACTGACTC
+
//*/8851:>DDCBBBBCBBBCDDBCC@CACCDC@B?CC99;BB>@A993319849999-9>;><@BBCFFCCDA@B8::BBBCBBB??9;;BBC@CC?>=@B?AA@CBAB@?;4429@BB:@@8<<8<:>DCBBBBBBCCBCBH?:<<B==BF133@@>>B<@>5;89DBB:=D=CACCCCBA@ACB@@<9999/9@<A@BBBB9BDCBBB2CBCCBBB
@YOX24:00004:00031
AGCAAACTTCAAGAAAATTCCTTCTTCCTCCAAGATGGGAACTCGACTTGGCTTTGTTGCGTAATTCGGTCAGAAAACCAAACTGCCTGCAATTGAGTAATTCTTATACAACACACTGCGTTTCAGGCATACCAAGCCCTGTAAGTTTGTTCAGCGCTTTAATCATGGCGTAAGTTTCACCAACCTGAGCATTGTAGTTTCTCAGACTTAATCGCCCACCTAACAACTGCTTCACT
+
88//13<B@D?B?BKK11,/,1,/777:AA@C?AB?@?:@>@@@@@BB>C<99919A699C>?<?<9969?@9999/95991<AAA?CBB@=@8;<CCC@C@DC;<<AAB@;;;;<;<<BCC>BAA?AB<998959@B:AABB=<<<:99AB1111111,1:<<@9959999?99909@@6969>>?>>>99599>99919@BCDCCC@EAB<99919@<??<9969999969@AD
@YOX24:00004:00032
TCAGCTCAGTCCTTACGTCGCCGTCCTAGCGGTGTCCTTATCATCCTGATAGCTAACATTCCCCGTTAGCGCACATCACTTGTTCCTTGAGCGTGTCCCTTGCATCTTCCTGATGCTGTCCATAACCTCAATCCTATGAGGGGTCCATTGTCTATCCTTAGTCATCAACATCCTATTGATGATTGCTTCCTTACAATGTCCTGCGCTCCATGCTTGATCAATCATCCTGATTAACCA
+
:99499C@@@A>A64/<8,,,*,,656>?BB?ABCC@A=??9::?:==?@:::CC@CCCACCB499;998?@@@CDBBCC<==ADBC@CBB;;9>>>>:>5<<CCDD@CICCD<<<ABBC8::B>A=@>><9948?>>8888.8B699:::8:;@B>@>@@@;;;@A=??@B>AAA4;;80::.00>:<9>.70;:0./=:==B?;:7+,,**0*,,./5765:::<633@7:76,/
@YOX24:00004:00033
AGCACAGCCACAAGCTTCACATACTTGGCTTTCTTCTAATTCGACACTCATAACGAATCCTCTTGTAAAACTAAGGGTATTCTAGCAGGTATTTGATCTGTATCTACA
+
999A999@899;?:99:@9944444992444-497733184737177777777=8=489<>88133788.333178,674:001//6;,.333,377>???BB::;<<

 

to: 

>YOX24:00004:00021
AATCAGATAGTCAGCGAAGCGATTCGCCGCGCCATGACCATTGAATGGGCTTTTATCTTGATCGAAGCACCTGTTTGTAACAAAAAGGGTTGGCCCTTCGAGTTTGATCTCTTTCGATGAGAACAAGTCGACGTTGCTGGATCATCTGGCGTTAGCAATGTGGTGATGATCGCGTTCTTCAAGTTGTTTTGTGGTTCGGTCGGAGAA
>YOX24:00004:00026
CCTAGGCATTACTCACCCGTCCGCCGCTCGACGCCGTTATCGTCCCCCGAAGGTTCAGTTAACTCGTTTCCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCAATCTGAGCCATGATCAAACTCTTCAATTTAAGATTTTGTTCGGCTCAATGAATACTGAACATTACATAAAGTAATGTTTGAATTGACTGTGCTGAGTCCGAAGACTCAAT
>YOX24:00004:00027
TTTGAAATCGATGACATCGAGTACTCGGTTCGATTGGTTTACGAAGCACGTTACCAAAAAGAAGGCGACATGAGCCTTGTGCTGCACAGCGCTGAAGACGGCAACTTCTACACACTTCGTTTACCGTTGGTCATGTAGACACTGGGCGCTGCATTATGATAGGTGGCCTACAAGGCCCTCGAAGCAGTGAAGAAAACAACGCAAAATCAAAAACTGACTC
>YOX24:00004:00031
AGCAAACTTCAAGAAAATTCCTTCTTCCTCCAAGATGGGAACTCGACTTGGCTTTGTTGCGTAATTCGGTCAGAAAACCAAACTGCCTGCAATTGAGTAATTCTTATACAACACACTGCGTTTCAGGCATACCAAGCCCTGTAAGTTTGTTCAGCGCTTTAATCATGGCGTAAGTTTCACCAACCTGAGCATTGTAGTTTCTCAGACTTAATCGCCCACCTAACAACTGCTTCACT
>YOX24:00004:00032
TCAGCTCAGTCCTTACGTCGCCGTCCTAGCGGTGTCCTTATCATCCTGATAGCTAACATTCCCCGTTAGCGCACATCACTTGTTCCTTGAGCGTGTCCCTTGCATCTTCCTGATGCTGTCCATAACCTCAATCCTATGAGGGGTCCATTGTCTATCCTTAGTCATCAACATCCTATTGATGATTGCTTCCTTACAATGTCCTGCGCTCCATGCTTGATCAATCATCCTGATTAACCA
>YOX24:00004:00033
AGCACAGCCACAAGCTTCACATACTTGGCTTTCTTCTAATTCGACACTCATAACGAATCCTCTTGTAAAACTAAGGGTATTCTAGCAGGTATTTGATCTGTATCTACA

however some of those steel  appear with some symbols of the quality (>CB>CBC@@@@@@9999/9@A6<9@248988-?). 

 

#!/usr/bin/perl -w
use strict;
use Getopt::Long;
use Term::ANSIColor; 

my ($imput, $output, $line, $usage);

GetOptions (
            'i=s' => \$imput,
            'o=s' => \$output,
            );

$usage = (qq(
          Error: 
             Wrong Arguments
             
             Usage:
             fastxQA -i infile.fastq -o utfile.fasta

));

if (!$imput or !$output) {
    print color("red"), "$usage", color("reset"),"\n\n";
    exit;
}

open FASTQIN, '<', "$imput" or die (color("red"), "\nCan't open $imput file", color("reset"),"\n\n"); 
open FASTAOUT, '>', "$output" or die (color("red"), "Can't genenate $output file", color("reset"),"\n\n");

 

while ($line= <FASTQIN>) {
    chomp $line;
    if ($line=~ s/^@/>/g) {
        my $id= $line;
        print FASTAOUT "$id\n";
        
    }
    elsif ($line=~ s/^[+]//g){
       next;
    }
    elsif ($line=~ s/[^a+|^c+|^g+|^t+|^n+]//gi){    # Here is the problem I tried this to :  [\d*|\*|@*|?*|;*|<*|>*|,*]
        next;                                                          # how to modified this to fix it ???
    }                                                                      
    else {
        
         my $fastaseq = $line;
         chomp $fastaseq;
         print FASTAOUT "$fastaseq\n";
    }
    
    
}
close FASTQIN;
close FASTAOUT;
exit; 

 

I think that the problem is when the quality start with @ symbol like : @A<@BB8<>AA?A>; 

I will Thanks So much If You Can Help Me (Sorry, I just Start to learn by myself perl ) !!! 

 

fastq fasta perl • 1.8k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by cabraham0320
3

You could try seqtk.  Fastq to fasta conversion is the first example:

seqtk seq -a in.fq > out.fa

 

ADD REPLYlink written 3.8 years ago by matted7.0k

following the seqtk advice, which I would definitely recommend, if you want to play a little bit more with the fastq lines and if you still want to do it in perl you could always use the universal implementation of seqtk named readfq. it simply requires embedding a little 40 lines subroutine in your code and you'll be able to handle fastq files fast and easy.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by Jorge Amigo11k

Thanks So much all of you, it was really helpful !!! 

it works well with the modifications suggested by thackl.  

THANKS SO MUCH TO ALL !!! 

ADD REPLYlink written 3.8 years ago by cabraham0320
3
gravatar for thackl
3.8 years ago by
thackl2.6k
MIT
thackl2.6k wrote:

An easy way to read fastq ist to read 4 lines at a time. It is faster and you don't have to worry about regexps.

while(
defined(my $shead = <FASTQIN>) &&
defined(my $sseq = <FASTQIN>) &&
defined(my $qhead = <FASTQIN>) &&
defined(my $qseq = <FASTQIN>)
){
  substr($shead, 0, 1, '>');
  print $shead, $sseq;
}

And if you want to have your FASTA sequence with a fixed line width:

my $line_width = 80
print $shead;
chomp($sseq);
print $_,"\n" for unpack "(A$line_width)*", $sseq;
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by thackl2.6k
3
gravatar for David Langenberger
3.8 years ago by
Deutschland
David Langenberger8.7k wrote:

Or just do it in the command line:

zcat file.fq.gz | paste - - - - | perl -F'\t' -ane '$F[0]=~s/^@/>/;print "$F[0]\n$F[1]\n";' | gzip -c > file.fa.gz
ADD COMMENTlink written 3.8 years ago by David Langenberger8.7k
1
gravatar for venu
3.8 years ago by
venu6.1k
Germany
venu6.1k wrote:

and if you need this only in perl check this

ADD COMMENTlink written 3.8 years ago by venu6.1k

I've always found readfq to be the fastest and simplest perl implementation for handling fastq files

ADD REPLYlink written 3.8 years ago by Jorge Amigo11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1394 users visited in the last hour