Question: How do I add ">" symbol to FASTA headers?
0
gravatar for Alec Watanabe
13 months ago by
Alec Watanabe30 wrote:

Dear community,

How do I add the ">" symbol to FASTA headers? I searched for other similar posts but none of them worked for me. Can I add the symbol with sed or awk? What would be the command? I want to add ">" to all the headers. Thanks in advance!

Input file (example):

Proteus_mirabilis_ARLG2970_2781 
atggagacaggtacagtaaagtggttcaataatgctaagggctttggttttattaccccagcaaacggtg
gcgaagatatttttgcccactattcaacaattagaatggaaggctaccgcacacttaaagcggggcagaa
agttaattatagcacgataaaagggcctaaaggtgaccatactgaccttatcattcctatcattgaatag
Proteus_mirabilis_ARLG2970_0131 
atgtctgacaaaatgaaaggtcaagttaagtggttcaacgagtctaaaggctttggttttattactccag
cagacggaagcaaagacgtattcgttcacttttctgccattcaaggtaacggtttcaaaactctggctga
aggtcagaacgtagaattcacaattgaaaacggtgcaaaaggtccagcagcagctaacgtaacagctctg
taa 
Proteus_penneri_ATCC35198_1543  
ttacagagcagttacgttagcagctgctggaccttttgcaccgttttcaattgtgaattctacgttctga
ccttcagccagagttttgaaaccgttaccttgaatggcagaaaagtgaacgaatacgtctttgcttccgt
ctgctggagtaataaaaccaaagcctttagactcgttgaaccacttaacttgacctttcattttgtcaga
cat 
Proteus_vulgaris_FDAARGOS366_2819   
ttagagagccaccacgttgcctgctgctgggcctttcataccattttccatggtgaatgaaacttgttgc
ccttcagctaatgttttgaagctatcactttggattgcagagaaatgtacgaatacatctttgctgccat
cagctggagtaataaaaccaaaacctttaccttcatcgaaccattttactgtaccagtcattgtattaga
cat 
Proteus_mirabilis_ARLG2970_2695 
ttacagagcgattacgttcgctgctgcagggcctttagcgccattttcaatagaaaatgaaacttcttgg
ccttctttcagtgacttgaagctttcactttggatcgctgaaaagtgtacgaatacgtctttgctaccgt
ctttaggagtgataaaaccgaagcctttatcatcgttaaaccattttactgtaccagtcattgtattaga
cat

Desired output: before each Proteus_.....................................etc, I want to add the ">" symbol.

awk sed fasta • 661 views
ADD COMMENTlink modified 13 months ago by genomax63k • written 13 months ago by Alec Watanabe30

Can you confirm if the sequences (Proteus word) is on a new line each time? It did not look like that before a mod possibly edited the post.

If they are on a separate lines then a simple sed 's/Proteus/\>Proteus/g' your_file > new_file will work.

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax63k

Thank you for your preoccupation kind sir! The headers are indeed on a new line as should be for a FASTA file. It's just that I'm new to Biostars and don't really know how to edit the text I post.

ADD REPLYlink written 13 months ago by Alec Watanabe30
4
gravatar for Alex Reynolds
13 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:
$ awk '{ if ($0 ~ /_/) { printf ">"; } print $0; }' in.fa > out.fa
ADD COMMENTlink written 13 months ago by Alex Reynolds27k
1

Thank you sir! This worked perfectly.

ADD REPLYlink modified 13 months ago • written 13 months ago by Alec Watanabe30
1
gravatar for Ahill
13 months ago by
Ahill1.4k
United States
Ahill1.4k wrote:
sed 's/^\([^acgt]\)/>\1/' <your input file> > <your_output_file>
ADD COMMENTlink modified 13 months ago • written 13 months ago by Ahill1.4k
1
gravatar for genomax
13 months ago by
genomax63k
United States
genomax63k wrote:
sed 's/Proteus/\
>Proteus/g' your_file > new_file

Yes, the command has to be typed on two lines as shown to get the newline before >.

Edit: See my note above. I will leave this here in case your sequences don't have the header starting on a fresh line.

ADD COMMENTlink modified 13 months ago • written 13 months ago by genomax63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1148 users visited in the last hour