How To Add A Unique Identifier For Fasta File To A Long List Of Sequences
2
2
Entering edit mode
12.5 years ago
Stevebob ▴ 20

I have a long list of sequences and I want to convert this list to a fasta file. How do I add > and a unique identifier to each line?

thanks!

sequence fasta • 7.3k views
ADD COMMENT
6
Entering edit mode
12.5 years ago

Awk is a possibility. Assuming one sequence per line in a file called sequencefile.txt:

awk '{print ">" NR; print $0}' sequencefile.txt

NR is the line number, so it will be unique relative to the sequences in sequencefile.txt.

ADD COMMENT
3
Entering edit mode

You can accept this as the answer by clicking on the checkmark just under the votes.

ADD REPLY
1
Entering edit mode

"assuming" is the grandma' of Satan :o)

ADD REPLY
0
Entering edit mode

worked great, thanks!

ADD REPLY
1
Entering edit mode
12.5 years ago

With Biopieces www.biopieces.org) you do:

read_tab -i in.tab -k SEQ | add_ident -k SEQ_NAME | write_fasta -o out.fasta -x

More info here: add_ident

ADD COMMENT
1
Entering edit mode

That is nice, but a little overkilling.

ADD REPLY
1
Entering edit mode

If you are using biopieces for further steps downstream then what you call slight overkill does make sense. Granted it doesn't look like stevebob was already using biopieces but maybe this gave him a push towards discovering it..

ADD REPLY
1
Entering edit mode

biopieces is a very convenient toolbox. It works very well, thanks Maasha!

ADD REPLY

Login before adding your answer.

Traffic: 1569 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6