Question: Count the number of sequences in fasta file using perl
0
gravatar for koushikayaluri
5 months ago by
koushikayaluri20 wrote:

I want to write a code that takes file input from the user in the command line and gives an output of the number of sequences in the input file, Total Length of all the sequence, Max sequence length, Min sequence length and Average length of the sequence.

ADD COMMENTlink modified 5 months ago by ivan.antonov50 • written 5 months ago by koushikayaluri20

Hello koushikayaluri ,

and what have you done so far? Where you run in trouble and need our help?

fin swimmer

ADD REPLYlink written 5 months ago by finswimmer9.8k

Hello koushikayaluri!

It appears that your post has been cross-posted to another site: https://stackoverflow.com/questions/52378534

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 5 months ago by Pierre Lindenbaum116k
2
gravatar for ivan.antonov
5 months ago by
ivan.antonov50
Moscow, Russia
ivan.antonov50 wrote:

You can do many things in UNIX command line using Perl and "wc".

Counting the number of sequences:

grep '>' INPUT.fasta | wc -l

Counting the total length of all sequences:

grep -v '>' INPUT.fasta | perl -pe 's/[\n\r]//g' | wc -c

List of all sequence lengths:

cat INPUT.fasta |\
perl -pe '/^>/ ? s/.*// : s/[\n\r]//g' |\
tail -n +2 |\
perl -pe 's/.*/length($&)/e'
ADD COMMENTlink modified 5 months ago • written 5 months ago by ivan.antonov50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1286 users visited in the last hour