How to find shortest lenth or longest length from fasta file
1
0
Entering edit mode
23 months ago
Neel ▴ 20

Hi, I am trying to sort shortest length and longest length fasta seq of amino acids from fasta file. If anyone know anything about it please let me know.

GNNVVIGAGAKVLGSFKVGDNVKIGAGSVVNKVVPSDTTVVGIPGRIVLHNGVPIKDPDL
RHDELPDPVNEMLKCLMQRVEYLEERLNDEESRYNVFEIIQHHDTSKRGVPPKGERESCD
VRVRSNDL
>tr|A0A3D3WF91|A0A3D3WF91_9PROT
MRGFIDELRSIRERDPAAGGLLGLMFLTPSVHVMLAYRLGHRLWRWRMRFLARFTMQLAR
WFTGIEIHPAARIGKRFFIDHGMGVVIGETAVIGDDVTFYHGVTLGGLLPSVDSDGQRQK
KRHPTIRNNVIVGAGAQILGAITVNECARVGANSVVVKDVPQGTTVTGIPARPVSARRVT
DETTFQPYGTPSDLSSDARDKAIKGLLREVDILHNKLKMLEDLQGEAKQSESSFQPEATS
LSASDRN
>tr|Q2JWT3|Q2JWT3_SYNJA
MLTESSRVSGTKSLPSAVESPETGGSDSPSSAKPEPKLGFWQQFWEDIDCVFERDPAARN
RWEVLLTYPGVHALFLHRIAHWLWKRRCFFLARLLSFISRSFTLIEIHPAARIGRRFFID
HGCGVVIGETAEIGDDVTLYHGVTLGGTSWTKGKRHPTLEDGVIVGTGAKILGPVRIGAR
ARIGANAVVIQDVAPGMTVVGIPGRAVIPPHQRRIPAHGIDLDHHLMPDPVGRAIEQLLH
RIQELEAQVARLNQEREQDRQDQVRCE
>sp|P0A9D5|CYSE_ECOL6
MSCEELEIVWNNIKAEARTLADCEPMLASFYHATLLKHENLGSALSYMLANKLSSPIMPA
IAIREVVEEAYAADPEMIASAACDIQAVRTRDPAVDKYSTPLLYLKGFHALQAYRIGHWL
WNQGRRALAIFLQNQVSVTFQVDIHPAAKIGRGIMLDHATGIVVGETAVIENDVSILQSV
TLGGTGKSGGDRHPKIREGVMIGAGAKILGNIEVGRGAKIGAGSVVLQPVPPHTTAAGVP
ARIVGKPDSDKPSMDMDQHFNGINHTFEYGDGI

Thank you!

fasta • 1.5k views
ADD COMMENT
1
Entering edit mode

this looks suspiciously like homework ...

ADD REPLY
0
Entering edit mode

use seqkit sort sequences by length

$ seqkit -w 0 --quiet sort -l test.fa
ADD REPLY
0
Entering edit mode

Hi, i used this script, which result is given below but i want to find the seq which is shortest and longest in all seq of fasta file.

"awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' file.fasta"

>tr|A0A0B4WZN6|A0A0B4WZN6_9HYPH Serine acetyltransferase OS=Rhizobium gallicum bv. gallicum R602sp OX=1041138 GN=cysE PE=3 SV=1
276
>tr|A0A2A5GE89|A0A2A5GE89_9GAMM Serine acetyltransferase OS=Porticoccaceae bacterium OX=2026782 GN=cysE PE=3 SV=1
280
>tr|A0A5F0DFU6|A0A5F0DFU6_9MICO Serine acetyltransferase OS=Cryobacterium sp. HLT2-28 OX=1259146 GN=cysE PE=3 SV=1
192

Thank you|

ADD REPLY
0
Entering edit mode

Thank you so much to all for your reply

ADD REPLY
0
Entering edit mode
23 months ago
Jesse ▴ 740

seqmagick has a feature for summarizing sequence files that gives minimum and maximum length:

$ seqmagick info file.fasta 
name       alignment    min_len   max_len   avg_len  num_seqs
file.fasta FALSE            128       273    228.75         4
ADD COMMENT

Login before adding your answer.

Traffic: 1578 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6