Question: Add sequence lengths to headers in a fasta file
0
gravatar for jan
8 weeks ago by
jan0
jan0 wrote:

Hi all. I have a fasta file and would like to add the sequence lengths to the headers by keeping the sequences. Thank you!

My file:

>Seq_1
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTPQSKIAWISETLCIGCGI
KILAGKQKPNLGKYDDPPDWQEILTYFRGSELQNYFTKILEDDLKAIIKPQYVDQIPKAA
KGTVGSILDRKDETKTQAIVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQK

>Seq_2
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTSQSKIAWISETLCIGCGI
CIKKCPFGALSIVNLPSNLEKETTHRYCANAFKLHRLPIPRPGEVLGLVGTNGIGKSTAL
KGTVGSILDRKDETKTQTVVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQKADIFMF
DEPSSYLDVKQRLKAAITIRSLINPDRYIIV

Desired output (the length info can follow after any delimiter):

>Seq_1[174]
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTPQSKIAWISETLCIGCGI
KILAGKQKPNLGKYDDPPDWQEILTYFRGSELQNYFTKILEDDLKAIIKPQYVDQIPKAA
KGTVGSILDRKDETKTQAIVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQK

>Seq_2[211]
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTSQSKIAWISETLCIGCGI
CIKKCPFGALSIVNLPSNLEKETTHRYCANAFKLHRLPIPRPGEVLGLVGTNGIGKSTAL
KGTVGSILDRKDETKTQTVVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQKADIFMF
DEPSSYLDVKQRLKAAITIRSLINPDRYIIV
sequence • 174 views
ADD COMMENTlink modified 8 weeks ago by WouterDeCoster16k • written 8 weeks ago by jan0
3

Perhaps provide where you want to use it for and what you tried.... Are you seeking unix, perl or other programming approaches? Is scaling an issue (over millions of sequences or of extreme length)? Is the fasta header always that format or is something else to be expected as well?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by ALchEmiXt1.8k
4
gravatar for shenwei356
8 weeks ago by
shenwei3562.8k
China
shenwei3562.8k wrote:

seqkit + awk on Linux/OS X

cat seqs.fa | seqkit fx2tab --length | awk -F "\t" '{print $1"["$4"]\t"$2}' | seqkit tab2fx > seqs2.fa

Example

$ echo -e ">seq\nacgtn\nACGTN"
>seq
acgtn
ACGTN

$ echo -e ">seq\nacgtn\nACGTN" | seqkit fx2tab --length | awk -F "\t" '{print $1"["$4"]\t"$2}' | seqkit tab2fx
>seq[10]
acgtnACGTN
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by shenwei3562.8k

Great, that's exactly what I was looking for. Thank you!

ADD REPLYlink written 8 weeks ago by jan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1429 users visited in the last hour