Count the number of sequences in fasta file using perl
1
0
Entering edit mode
4.4 years ago

I want to write a code that takes file input from the user in the command line and gives an output of the number of sequences in the input file, Total Length of all the sequence, Max sequence length, Min sequence length and Average length of the sequence.

perl bioperl bioinformatics fasta • 3.0k views
0
Entering edit mode

Hello koushikayaluri ,

and what have you done so far? Where you run in trouble and need our help?

fin swimmer

0
Entering edit mode

Hello koushikayaluri!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/52378534

This is typically not recommended as it runs the risk of annoying people in both communities.

4
Entering edit mode
4.4 years ago
ivan.antonov ▴ 90

You can do many things in UNIX command line using Perl and "wc".

Counting the number of sequences:

grep '>' INPUT.fasta | wc -l


Counting the total length of all sequences:

grep -v '>' INPUT.fasta | perl -pe 's/[\n\r]//g' | wc -c


List of all sequence lengths:

cat INPUT.fasta |\
perl -pe '/^>/ ? s/.*// : s/[\n\r]//g' |\
tail -n +2 |\
perl -pe 's/.*/length(\$&)/e'