Hi, I have a code to extract sequences and at the same time eliminate all the gaps (-), space, tabs, returns and space among each line of a sequences, some like:
From this:
>ID-Name
CCGCG CTG--GATGCGGAC
ACCGA AGCAA-CCGCCAATA
to this:
>ID-Name
CCGCGCTGGATGCGGACACCGAAGCAACCGCCAATA
I have this code:
#!/usr/bin/perl
use strict;
my $input_file = $ARGV[0];
my $output_file = $ARGV[1];
if ($#ARGV !=1) {
print "\n ** Wrong Arguments **\n\n";
print " - USE: fasta_remove.pl InFile.fasta OutFile.fasta\n";
}
my $infile = $input_file;
open INFILE, $infile or die "Can't open $infile: $!\n";
my $outfile = $output_file;
open OUTFILE, ">$outfile" or die " - An output_file.fasta is Requested \n\n";
my $sequence = ();
my $line;
my $idseq;
while ($line = <INFILE>) {
chomp $line;
if($line =~ /^\s*$/) {
next;
}
elsif($line =~ /^\s*#/) {
next;
}
elsif($line =~ tr/-//){
next;
}
elsif($line =~ /^>/) {
$idseq= $line;
print OUTFILE "\n $idseq\n"; # I know that the problem is here with "\n \n", but I don't know how to fix it !!!
next;
}
else {
$sequence = $line;
}
$sequence =~ s/\s//g;
print OUTFILE "$sequence";
}
The problem is that always make a line (whitespace) between the top of the file and the first sequence; I want to avoid that line, if somebody can help me with that I will thank so much
Or you could use a simple one-liner instead if you are really eager to use perl:
test.txt
is your input file andout
is your output.