Entering edit mode
                    10.6 years ago
        div
        
    
        ▴
    
    60
    can anyone please tell me the script to check the size of the each given contig (contigs are in multi-fasta format).
can anyone please tell me the script to check the size of the each given contig (contigs are in multi-fasta format).
I provide a perl solution as follows (usage: perl length.pl fastafile).
#!/usr/bin/perl
use strict;
use warnings;
open my $fasta_file, '<', $ARGV[0] or die $!;
my ($id, $seq);
while (<$fasta_file>) {
    chomp;
    if (/^>(\S+).*/) {
        print "$id\t", length($seq), "\n" if defined $id;
        $id = $1;
        $seq = '';
    } else {
        $seq .= $_;
    }
}
close $fasta_file;
print "$id\t", length($seq), "\n";
                    
                
                From bioawk tutorial - https://github.com/vsbuffalo/bioawk-tutorial
bioawk -cfastx '{print $name, length($seq)}' test.fasta
                    
                
                Hi, try awk solution:
for i in *.fa
 do awk 'BEGIN{RS=">"}NR>1{sub("\n","\t"); gsub("\n",""); print RS$0}' $i > ${i%.fasta}.column.csv
  awk 'length($2) {print $1 "\t" length}' ${i%.fasta}.column.csv | sort -k 2n > ${i%.fasta}.csv
 done;
Or you can put second awk command to pipe for avoiding create *column.csv file.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This looks like a HW question. Don't give up easily and try some more :-). I assure you that you would enjoy the whole learning process. You can search for a few posts here on Biostar for help. Search for "length fasta sequences".
+1 A. Pandey. See also Code Golf: Mean Length Of Fasta Sequences