Question: How do I add ">" symbol to FASTA headers?
0
gravatar for Alec Watanabe
2.2 years ago by
Alec Watanabe60 wrote:

Dear community,

How do I add the ">" symbol to FASTA headers? I searched for other similar posts but none of them worked for me. Can I add the symbol with sed or awk? What would be the command? I want to add ">" to all the headers. Thanks in advance!

Input file (example):

Proteus_mirabilis_ARLG2970_2781 
atggagacaggtacagtaaagtggttcaataatgctaagggctttggttttattaccccagcaaacggtg
gcgaagatatttttgcccactattcaacaattagaatggaaggctaccgcacacttaaagcggggcagaa
agttaattatagcacgataaaagggcctaaaggtgaccatactgaccttatcattcctatcattgaatag
Proteus_mirabilis_ARLG2970_0131 
atgtctgacaaaatgaaaggtcaagttaagtggttcaacgagtctaaaggctttggttttattactccag
cagacggaagcaaagacgtattcgttcacttttctgccattcaaggtaacggtttcaaaactctggctga
aggtcagaacgtagaattcacaattgaaaacggtgcaaaaggtccagcagcagctaacgtaacagctctg
taa 
Proteus_penneri_ATCC35198_1543  
ttacagagcagttacgttagcagctgctggaccttttgcaccgttttcaattgtgaattctacgttctga
ccttcagccagagttttgaaaccgttaccttgaatggcagaaaagtgaacgaatacgtctttgcttccgt
ctgctggagtaataaaaccaaagcctttagactcgttgaaccacttaacttgacctttcattttgtcaga
cat 
Proteus_vulgaris_FDAARGOS366_2819   
ttagagagccaccacgttgcctgctgctgggcctttcataccattttccatggtgaatgaaacttgttgc
ccttcagctaatgttttgaagctatcactttggattgcagagaaatgtacgaatacatctttgctgccat
cagctggagtaataaaaccaaaacctttaccttcatcgaaccattttactgtaccagtcattgtattaga
cat 
Proteus_mirabilis_ARLG2970_2695 
ttacagagcgattacgttcgctgctgcagggcctttagcgccattttcaatagaaaatgaaacttcttgg
ccttctttcagtgacttgaagctttcactttggatcgctgaaaagtgtacgaatacgtctttgctaccgt
ctttaggagtgataaaaccgaagcctttatcatcgttaaaccattttactgtaccagtcattgtattaga
cat

Desired output: before each Proteus_.....................................etc, I want to add the ">" symbol.

awk sed fasta • 1.2k views
ADD COMMENTlink modified 2.2 years ago by genomax80k • written 2.2 years ago by Alec Watanabe60

Can you confirm if the sequences (Proteus word) is on a new line each time? It did not look like that before a mod possibly edited the post.

If they are on a separate lines then a simple sed 's/Proteus/\>Proteus/g' your_file > new_file will work.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax80k

Thank you for your preoccupation kind sir! The headers are indeed on a new line as should be for a FASTA file. It's just that I'm new to Biostars and don't really know how to edit the text I post.

ADD REPLYlink written 2.2 years ago by Alec Watanabe60
4
gravatar for Alex Reynolds
2.2 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:
$ awk '{ if ($0 ~ /_/) { printf ">"; } print $0; }' in.fa > out.fa
ADD COMMENTlink written 2.2 years ago by Alex Reynolds29k
1

Thank you sir! This worked perfectly.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Alec Watanabe60
1
gravatar for Ahill
2.2 years ago by
Ahill1.8k
United States
Ahill1.8k wrote:
sed 's/^\([^acgt]\)/>\1/' <your input file> > <your_output_file>
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by Ahill1.8k
1
gravatar for genomax
2.2 years ago by
genomax80k
United States
genomax80k wrote:
sed 's/Proteus/\
>Proteus/g' your_file > new_file

Yes, the command has to be typed on two lines as shown to get the newline before >.

Edit: See my note above. I will leave this here in case your sequences don't have the header starting on a fresh line.

ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by genomax80k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1947 users visited in the last hour