Question: Count the number of sequences in fasta file using perl
0
gravatar for koushikayaluri
10 months ago by
koushikayaluri20 wrote:

I want to write a code that takes file input from the user in the command line and gives an output of the number of sequences in the input file, Total Length of all the sequence, Max sequence length, Min sequence length and Average length of the sequence.

ADD COMMENTlink modified 10 months ago by ivan.antonov70 • written 10 months ago by koushikayaluri20

Hello koushikayaluri ,

and what have you done so far? Where you run in trouble and need our help?

fin swimmer

ADD REPLYlink written 10 months ago by finswimmer11k

Hello koushikayaluri!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/52378534

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 10 months ago by Pierre Lindenbaum121k
4
gravatar for ivan.antonov
10 months ago by
ivan.antonov70
Moscow, Russia
ivan.antonov70 wrote:

You can do many things in UNIX command line using Perl and "wc".

Counting the number of sequences:

grep '>' INPUT.fasta | wc -l

Counting the total length of all sequences:

grep -v '>' INPUT.fasta | perl -pe 's/[\n\r]//g' | wc -c

List of all sequence lengths:

cat INPUT.fasta |\
perl -pe '/^>/ ? s/.*// : s/[\n\r]//g' |\
tail -n +2 |\
perl -pe 's/.*/length($&)/e'
ADD COMMENTlink modified 10 months ago • written 10 months ago by ivan.antonov70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1784 users visited in the last hour