Entering edit mode
10.6 years ago
div
▴
60
can anyone please tell me the script to check the size of the each given contig (contigs are in multi-fasta format).
can anyone please tell me the script to check the size of the each given contig (contigs are in multi-fasta format).
I provide a perl solution as follows (usage: perl length.pl fastafile).
#!/usr/bin/perl
use strict;
use warnings;
open my $fasta_file, '<', $ARGV[0] or die $!;
my ($id, $seq);
while (<$fasta_file>) {
chomp;
if (/^>(\S+).*/) {
print "$id\t", length($seq), "\n" if defined $id;
$id = $1;
$seq = '';
} else {
$seq .= $_;
}
}
close $fasta_file;
print "$id\t", length($seq), "\n";
From bioawk tutorial - https://github.com/vsbuffalo/bioawk-tutorial
bioawk -cfastx '{print $name, length($seq)}' test.fasta
Hi, try awk solution:
for i in *.fa
do awk 'BEGIN{RS=">"}NR>1{sub("\n","\t"); gsub("\n",""); print RS$0}' $i > ${i%.fasta}.column.csv
awk 'length($2) {print $1 "\t" length}' ${i%.fasta}.column.csv | sort -k 2n > ${i%.fasta}.csv
done;
Or you can put second awk command to pipe for avoiding create *column.csv file.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This looks like a HW question. Don't give up easily and try some more :-). I assure you that you would enjoy the whole learning process. You can search for a few posts here on Biostar for help. Search for "length fasta sequences".
+1 A. Pandey. See also Code Golf: Mean Length Of Fasta Sequences