Question: Removing small contigs from fasta files
gravatar for Nyksubuz
12 months ago by
Nyksubuz10 wrote:
use strict;
use warnings;

my $minlen = shift or die "Error: `minlen` parameter not provided\n";
    local $/=">";
    while(<>) {
        next unless /\w/;
        my @chunk = split /\n/;
        my $header = shift @chunk;
        my $seqlen = length join "", @chunk;
        print ">$_" if($seqlen >= $minlen);
    local $/="\n";

Exexecuting the script as follows:

perl 1000 contigs.fasta > contigs-1000.fasta

The above script works for me but there is a problem, i have 109 different fasta files with different file names. i can run the script for individual file but i want to run the script at once for all files and the result file should be individually different for each.

file names are like SRR8224532.fasta, SRR8224533.fasta, SRR8224534.fasta, and so on i want the result files after removing the contigs (i.e., for me less than 1000) something like SRR8224532-out.fasta, SRR8224533-out.fasta, and so on.

Any help or suggestion would be helpfull.

fasta contigs assembly perl • 534 views
ADD COMMENTlink modified 8 months ago by Bioinfo20 • written 12 months ago by Nyksubuz10

Write a loop function!

ADD REPLYlink written 12 months ago by rajpal2228850
gravatar for liorglic
12 months ago by
liorglic340 wrote:

You have two options:
1. Change the script so it can loop on your list of files - shouldn't be too hard.
2. Use some bash tricks to run your script as is on all your files. This is assuming you're on some kind of Unix OS (Linux, MacOS etc.) Then you can do something like this (from your command line):

files=("file1.fasta" "file2.fasta" "file3.fasta")
for f in "${files[@]}"; do echo $f; perl 1000 $f > "$f""gt1000"; done
ADD COMMENTlink modified 12 months ago • written 12 months ago by liorglic340
for i in *.fasta; 
    do perl 1000 $i > ${i%.fasta}-out.fasta; 

i think even this will work

ADD REPLYlink modified 12 months ago by h.mon31k • written 12 months ago by Nyksubuz10
gravatar for Bioinfo
8 months ago by
Bioinfo20 wrote:

Hello One option is to use from the bbmap package in=contigs.fasta out=filtered.fasta minlength=200

Good luck !

ADD COMMENTlink written 8 months ago by Bioinfo20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1169 users visited in the last hour