Count the number of sequences in fasta file using perl
4.4 years ago

I want to write a code that takes file input from the user in the command line and gives an output of the number of sequences in the input file, Total Length of all the sequence, Max sequence length, Min sequence length and Average length of the sequence.

Hello koushikayaluri ,

and what have you done so far? Where you run in trouble and need our help?

fin swimmer

Hello koushikayaluri!

4.4 years ago
ivan.antonov ▴ 90

You can do many things in UNIX command line using Perl and "wc".

Counting the number of sequences:

grep '>' INPUT.fasta | wc -l


Counting the total length of all sequences:

grep -v '>' INPUT.fasta | perl -pe 's/[\n\r]//g' | wc -c


List of all sequence lengths:

cat INPUT.fasta |\
perl -pe '/^>/ ? s/.*// : s/[\n\r]//g' |\
tail -n +2 |\
perl -pe 's/.*/length(\$&)/e'