Question: How To Remove The Identifier And Quality From A Fastq File
1
gravatar for tenisha.phipps
4.5 years ago by
tenisha.phipps10 wrote:

I have a fastq file and need to remove the identifier, quality, and white space. So far my script looks something like this:

#!/usr/bin/perl -w

use strict;

my $inputFileName = shift;
my $outputFileName = shift;

open INPUTFILE, "<$inputFileName" or die "poop";
open OUTPUT, ">$outputFileName" or die "poop";

my @bases = ('A', 'G', 'T', 'C');
my $line;

while ($line = <INPUTFILE>) {
  chomp $line;
  if ($line =~ /^\s*$/)
  elsif ($line =~ /^\s*@/)
  elsif ($line =~ /^+/)   
  else {print OUTPUT $line, "\n";
}
}

However I keep getting an empty output file. I'm very new to perl, so be gently.

Thanks!!

perl fastq • 1.7k views
ADD COMMENTlink modified 4.5 years ago by JC6.1k • written 4.5 years ago by tenisha.phipps10

Your problem is the fastq format doesn't contains spaces, to get only the sequence, it's better to count lines like the solutions proposed below. Also you have no instructions after your if, elsif, you should use next.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by JC6.1k
3
gravatar for Pierre Lindenbaum
4.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum94k wrote:

if your fastq file uses a standard layout (4 lines per record ) you could just count the lines and keep those having the correct modulo.

Or , using awk:

 awk '(NR%4==2)'  < file.fastq
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Pierre Lindenbaum94k
1
gravatar for Irsan
4.5 years ago by
Irsan6.2k
Amsterdam
Irsan6.2k wrote:
#!/usr/bin/env perl
use strict;
use warnings;

my $lines = 4;
my $delimiter = "\t";
my $input = shift @ARGV;

open(INPUT,"$input");
while (<INPUT>) {
    if($. % $lines == 2){ # the % character is the modulo operator Pierre was talking about
        print
    }
}
close(INPUT);

Put that in a file called extractreadsfrom_fastq.pl and do the trick with:

perl extract_reads_from_fastq.pl input.fastq
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Irsan6.2k

And boom goes the dynamite!! Thanks this worked perfectly!

ADD REPLYlink written 4.4 years ago by tenisha.phipps10
1
gravatar for Alex Paciorkowski
4.5 years ago by
Rochester, NY USA
Alex Paciorkowski3.2k wrote:

You could also do something like this:

#!/usr/bin/perl
use strict;
use warnings;

my $file = shift;
open my $F, $file;
LINE: while ($_=<$F>) {
    my @line = split /\t/;
    chomp @line;
    next if /^@/; # gets rid of line 1 of fastq
    next if /^\+/; # gets rid of line 3
    next if /^!/; # gets rid of line 4 if it begins with a ! -- check your files format

    my $printme = 0;
    ++$printme;

     print join(qq/\t/, @line) if $printme;
     }
print STDERR "Done.\n";

Of course there's probably a nice one-liner too for this...

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Alex Paciorkowski3.2k

I used to eval fastq files with those reg-ex, but Illumina 1.8+ (Phred+33) brokes that.

ADD REPLYlink written 4.5 years ago by JC6.1k
1
gravatar for JC
4.5 years ago by
JC6.1k
Mexico
JC6.1k wrote:

Perl one liner:

perl -ne 'print if (++$n % 4 == 2)' < file.fq > output
ADD COMMENTlink written 4.5 years ago by JC6.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 433 users visited in the last hour