How do I add ">" symbol to FASTA headers?
3
0
Entering edit mode
6.8 years ago

Dear community,

How do I add the ">" symbol to FASTA headers? I searched for other similar posts but none of them worked for me. Can I add the symbol with sed or awk? What would be the command? I want to add ">" to all the headers. Thanks in advance!

Input file (example):

Proteus_mirabilis_ARLG2970_2781 
atggagacaggtacagtaaagtggttcaataatgctaagggctttggttttattaccccagcaaacggtg
gcgaagatatttttgcccactattcaacaattagaatggaaggctaccgcacacttaaagcggggcagaa
agttaattatagcacgataaaagggcctaaaggtgaccatactgaccttatcattcctatcattgaatag
Proteus_mirabilis_ARLG2970_0131 
atgtctgacaaaatgaaaggtcaagttaagtggttcaacgagtctaaaggctttggttttattactccag
cagacggaagcaaagacgtattcgttcacttttctgccattcaaggtaacggtttcaaaactctggctga
aggtcagaacgtagaattcacaattgaaaacggtgcaaaaggtccagcagcagctaacgtaacagctctg
taa 
Proteus_penneri_ATCC35198_1543  
ttacagagcagttacgttagcagctgctggaccttttgcaccgttttcaattgtgaattctacgttctga
ccttcagccagagttttgaaaccgttaccttgaatggcagaaaagtgaacgaatacgtctttgcttccgt
ctgctggagtaataaaaccaaagcctttagactcgttgaaccacttaacttgacctttcattttgtcaga
cat 
Proteus_vulgaris_FDAARGOS366_2819   
ttagagagccaccacgttgcctgctgctgggcctttcataccattttccatggtgaatgaaacttgttgc
ccttcagctaatgttttgaagctatcactttggattgcagagaaatgtacgaatacatctttgctgccat
cagctggagtaataaaaccaaaacctttaccttcatcgaaccattttactgtaccagtcattgtattaga
cat 
Proteus_mirabilis_ARLG2970_2695 
ttacagagcgattacgttcgctgctgcagggcctttagcgccattttcaatagaaaatgaaacttcttgg
ccttctttcagtgacttgaagctttcactttggatcgctgaaaagtgtacgaatacgtctttgctaccgt
ctttaggagtgataaaaccgaagcctttatcatcgttaaaccattttactgtaccagtcattgtattaga
cat

Desired output: before each Proteus_.....................................etc, I want to add the ">" symbol.

FASTA sed awk • 4.0k views
ADD COMMENT
0
Entering edit mode

Can you confirm if the sequences (Proteus word) is on a new line each time? It did not look like that before a mod possibly edited the post.

If they are on a separate lines then a simple sed 's/Proteus/\>Proteus/g' your_file > new_file will work.

ADD REPLY
0
Entering edit mode

Thank you for your preoccupation kind sir! The headers are indeed on a new line as should be for a FASTA file. It's just that I'm new to Biostars and don't really know how to edit the text I post.

ADD REPLY
4
Entering edit mode
6.8 years ago
$ awk '{ if ($0 ~ /_/) { printf ">"; } print $0; }' in.fa > out.fa
ADD COMMENT
1
Entering edit mode

Thank you sir! This worked perfectly.

ADD REPLY
1
Entering edit mode
6.8 years ago
Ahill ★ 2.0k
sed 's/^\([^acgt]\)/>\1/' <your input file> > <your_output_file>
ADD COMMENT
1
Entering edit mode
6.8 years ago
GenoMax 147k
sed 's/Proteus/\
>Proteus/g' your_file > new_file

Yes, the command has to be typed on two lines as shown to get the newline before >.

Edit: See my note above. I will leave this here in case your sequences don't have the header starting on a fresh line.

ADD COMMENT

Login before adding your answer.

Traffic: 2202 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6