Question: Removing small contigs from fasta files
0
gravatar for Nyksubuz
12 months ago by
Nyksubuz10
Nyksubuz10 wrote:
## removesmalls.pl
#!/usr/bin/perl
use strict;
use warnings;

my $minlen = shift or die "Error: `minlen` parameter not provided\n";
{
    local $/=">";
    while(<>) {
        chomp;
        next unless /\w/;
        s/>$//gs;
        my @chunk = split /\n/;
        my $header = shift @chunk;
        my $seqlen = length join "", @chunk;
        print ">$_" if($seqlen >= $minlen);
    }
    local $/="\n";
}

Exexecuting the script as follows:

perl removesmalls.pl 1000 contigs.fasta > contigs-1000.fasta

The above script works for me but there is a problem, i have 109 different fasta files with different file names. i can run the script for individual file but i want to run the script at once for all files and the result file should be individually different for each.

file names are like SRR8224532.fasta, SRR8224533.fasta, SRR8224534.fasta, and so on i want the result files after removing the contigs (i.e., for me less than 1000) something like SRR8224532-out.fasta, SRR8224533-out.fasta, and so on.

Any help or suggestion would be helpfull.

fasta contigs assembly perl • 534 views
ADD COMMENTlink modified 8 months ago by Bioinfo20 • written 12 months ago by Nyksubuz10
1

Write a loop function!

ADD REPLYlink written 12 months ago by rajpal2228850
4
gravatar for liorglic
12 months ago by
liorglic340
liorglic340 wrote:

You have two options:
1. Change the script so it can loop on your list of files - shouldn't be too hard.
2. Use some bash tricks to run your script as is on all your files. This is assuming you're on some kind of Unix OS (Linux, MacOS etc.) Then you can do something like this (from your command line):

files=("file1.fasta" "file2.fasta" "file3.fasta")
for f in "${files[@]}"; do echo $f; perl removesmalls.pl 1000 $f > "$f""gt1000"; done
ADD COMMENTlink modified 12 months ago • written 12 months ago by liorglic340
for i in *.fasta; 
    do perl removesmalls.pl 1000 $i > ${i%.fasta}-out.fasta; 
done

i think even this will work

ADD REPLYlink modified 12 months ago by h.mon31k • written 12 months ago by Nyksubuz10
1
gravatar for Bioinfo
8 months ago by
Bioinfo20
Morocco
Bioinfo20 wrote:

Hello One option is to use reformat.sh from the bbmap package reformat.sh in=contigs.fasta out=filtered.fasta minlength=200

Good luck !

ADD COMMENTlink written 8 months ago by Bioinfo20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1169 users visited in the last hour