Add sequence lengths to headers in a fasta file
1
0
Entering edit mode
7.1 years ago
jan • 0

Hi all. I have a fasta file and would like to add the sequence lengths to the headers by keeping the sequences. Thank you!

My file:

>Seq_1
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTPQSKIAWISETLCIGCGI
KILAGKQKPNLGKYDDPPDWQEILTYFRGSELQNYFTKILEDDLKAIIKPQYVDQIPKAA
KGTVGSILDRKDETKTQAIVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQK

>Seq_2
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTSQSKIAWISETLCIGCGI
CIKKCPFGALSIVNLPSNLEKETTHRYCANAFKLHRLPIPRPGEVLGLVGTNGIGKSTAL
KGTVGSILDRKDETKTQTVVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQKADIFMF
DEPSSYLDVKQRLKAAITIRSLINPDRYIIV

Desired output (the length info can follow after any delimiter):

>Seq_1[174]
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTPQSKIAWISETLCIGCGI
KILAGKQKPNLGKYDDPPDWQEILTYFRGSELQNYFTKILEDDLKAIIKPQYVDQIPKAA
KGTVGSILDRKDETKTQAIVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQK

>Seq_2[211]
MADKLTRIAIVNHDKCKPKKCRQECKKSCPVVRMGKLCIEVTSQSKIAWISETLCIGCGI
CIKKCPFGALSIVNLPSNLEKETTHRYCANAFKLHRLPIPRPGEVLGLVGTNGIGKSTAL
KGTVGSILDRKDETKTQTVVCQQLDLTHLKERNVEDLSGGELQRFACAVVCIQKADIFMF
DEPSSYLDVKQRLKAAITIRSLINPDRYIIV
sequence • 3.3k views
ADD COMMENT
3
Entering edit mode

Perhaps provide where you want to use it for and what you tried.... Are you seeking unix, perl or other programming approaches? Is scaling an issue (over millions of sequences or of extreme length)? Is the fasta header always that format or is something else to be expected as well?

ADD REPLY
5
Entering edit mode
7.1 years ago

seqkit + awk on Linux/OS X

cat seqs.fa | seqkit fx2tab --length | awk -F "\t" '{print $1"["$4"]\t"$2}' | seqkit tab2fx > seqs2.fa

Example

$ echo -e ">seq\nacgtn\nACGTN"
>seq
acgtn
ACGTN

$ echo -e ">seq\nacgtn\nACGTN" | seqkit fx2tab --length | awk -F "\t" '{print $1"["$4"]\t"$2}' | seqkit tab2fx
>seq[10]
acgtnACGTN
ADD COMMENT
0
Entering edit mode

Great, that's exactly what I was looking for. Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 2009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6