3
0
Entering edit mode
4.0 years ago

Dear community,

How do I add the ">" symbol to FASTA headers? I searched for other similar posts but none of them worked for me. Can I add the symbol with sed or awk? What would be the command? I want to add ">" to all the headers. Thanks in advance!

Input file (example):

Proteus_mirabilis_ARLG2970_2781
atggagacaggtacagtaaagtggttcaataatgctaagggctttggttttattaccccagcaaacggtg
gcgaagatatttttgcccactattcaacaattagaatggaaggctaccgcacacttaaagcggggcagaa
agttaattatagcacgataaaagggcctaaaggtgaccatactgaccttatcattcctatcattgaatag
Proteus_mirabilis_ARLG2970_0131
atgtctgacaaaatgaaaggtcaagttaagtggttcaacgagtctaaaggctttggttttattactccag
cagacggaagcaaagacgtattcgttcacttttctgccattcaaggtaacggtttcaaaactctggctga
aggtcagaacgtagaattcacaattgaaaacggtgcaaaaggtccagcagcagctaacgtaacagctctg
taa
Proteus_penneri_ATCC35198_1543
ttacagagcagttacgttagcagctgctggaccttttgcaccgttttcaattgtgaattctacgttctga
ccttcagccagagttttgaaaccgttaccttgaatggcagaaaagtgaacgaatacgtctttgcttccgt
ctgctggagtaataaaaccaaagcctttagactcgttgaaccacttaacttgacctttcattttgtcaga
cat
Proteus_vulgaris_FDAARGOS366_2819
ttagagagccaccacgttgcctgctgctgggcctttcataccattttccatggtgaatgaaacttgttgc
ccttcagctaatgttttgaagctatcactttggattgcagagaaatgtacgaatacatctttgctgccat
cagctggagtaataaaaccaaaacctttaccttcatcgaaccattttactgtaccagtcattgtattaga
cat
Proteus_mirabilis_ARLG2970_2695
ttacagagcgattacgttcgctgctgcagggcctttagcgccattttcaatagaaaatgaaacttcttgg
ccttctttcagtgacttgaagctttcactttggatcgctgaaaagtgtacgaatacgtctttgctaccgt
ctttaggagtgataaaaccgaagcctttatcatcgttaaaccattttactgtaccagtcattgtattaga
cat


Desired output: before each Proteus_.....................................etc, I want to add the ">" symbol.

FASTA sed awk • 2.5k views
0
Entering edit mode

Can you confirm if the sequences (Proteus word) is on a new line each time? It did not look like that before a mod possibly edited the post.

If they are on a separate lines then a simple sed 's/Proteus/\>Proteus/g' your_file > new_file will work.

0
Entering edit mode

Thank you for your preoccupation kind sir! The headers are indeed on a new line as should be for a FASTA file. It's just that I'm new to Biostars and don't really know how to edit the text I post.

4
Entering edit mode
4.0 years ago
$awk '{ if ($0 ~ /_/) { printf ">"; } print \$0; }' in.fa > out.fa

1
Entering edit mode

Thank you sir! This worked perfectly.

1
Entering edit mode
4.0 years ago
Ahill ★ 1.9k
sed 's/^$$[^acgt]$$/>\1/' <your input file> > <your_output_file>

1
Entering edit mode
4.0 years ago
GenoMax 112k
sed 's/Proteus/\
>Proteus/g' your_file > new_file


Yes, the command has to be typed on two lines as shown to get the newline before >.

Edit: See my note above. I will leave this here in case your sequences don't have the header starting on a fresh line.