Count the number of sequences in fasta file using perl
1
0
Entering edit mode
3.2 years ago

I want to write a code that takes file input from the user in the command line and gives an output of the number of sequences in the input file, Total Length of all the sequence, Max sequence length, Min sequence length and Average length of the sequence.

perl bioperl bioinformatics fasta • 2.5k views
ADD COMMENT
0
Entering edit mode

Hello koushikayaluri ,

and what have you done so far? Where you run in trouble and need our help?

fin swimmer

ADD REPLY
0
Entering edit mode

Hello koushikayaluri!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/52378534

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
4
Entering edit mode
3.2 years ago
ivan.antonov ▴ 90

You can do many things in UNIX command line using Perl and "wc".

Counting the number of sequences:

grep '>' INPUT.fasta | wc -l

Counting the total length of all sequences:

grep -v '>' INPUT.fasta | perl -pe 's/[\n\r]//g' | wc -c

List of all sequence lengths:

cat INPUT.fasta |\
perl -pe '/^>/ ? s/.*// : s/[\n\r]//g' |\
tail -n +2 |\
perl -pe 's/.*/length($&)/e'
ADD COMMENT

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6