Help Needed For Formatting Fasta Headers
3
1
Entering edit mode
11.0 years ago
 >HWI-ST667:190:C0TPFACXX:1:1101:2885:1985 1:N:0:GATCAG
 TCGGATAGAGCTCCAAATCTATCT
 >HWI-ST667:190:C0TPFACXX:1:1101:3058:1999 1:N:0:GATCAG
 CAATATCAACTGCTGCAACTCTCT
 >HWI-ST667:190:C0TPFACXX:1:1101:3372:1992 1:N:0:GATCAG
 TCAAAGGTTGAAGAGAATGAAATTTCT
 ......

How to use perl script to change the above FASTA file (just the header) into the following format? Many thanks! I'm a biologist with little programing background.

 >seq_1
 TCGGATAGAGCTCCAAATCTATCT      
 >seq_2
 CAATATCAACTGCTGCAACTCTCT
 >seq_3
 TCAAAGGTTGAAGAGAATGAAATTTCT
 ......
perl fasta • 2.5k views
ADD COMMENT
4
Entering edit mode
11.0 years ago
Irsan ★ 7.8k

On a linux command line do:

awk 'BEGIN{OFS="_";seq=1}{if($0 ~ /^>/){print ">seq",seq;seq++}else{print $0}}' yourFile.fasta

Of course change yourFile.fasta for the name of your file...

gives you:

>seq_1
TCGGATAGAGCTCCAAATCTATCT
>seq_2
CAATATCAACTGCTGCAACTCTCT
>seq_3
TCAAAGGTTGAAGAGAATGAAATTTCT
ADD COMMENT
0
Entering edit mode

Thank you so much! It also worked on iMac : P

ADD REPLY
2
Entering edit mode
11.0 years ago
Kenosis ★ 1.3k

Here are two more options:

use strict;
use warnings;

my $i = 0;
while (<>) {
    s/^>\K.+/'seq_' . ++$i/e;
    print;
}

Usage: perl script.pl inFile [>outFile]

The last, optional parameter directs output to a file.

As a oneliner:

perl -ne 's/^>\K.+/'seq_' . ++$i/e; print' inFile [>outFile]

Output on your dataset from both:

>seq_1
TCGGATAGAGCTCCAAATCTATCT
>seq_2
CAATATCAACTGCTGCAACTCTCT
>seq_3
TCAAAGGTTGAAGAGAATGAAATTTCT

Hope this helps!

ADD COMMENT

Login before adding your answer.

Traffic: 1817 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6