renaming all fasta headers in a file
2
1
Entering edit mode
7.0 years ago
branokdrung ▴ 10

Hi everyone,

im encountering a problem with too long fasta headers. They get truncated at the 20th position by a program (TargetP) im using.

Example:

>ConsensusfromContig10000-snap_masked-ConsensusfromContig10000-abinit-gene-0.1-mRNA-1:cds:3144/1451-1467:0:+
MKKSGDIDEIWKSMQEDARPKPRLPPLPAAAPPAPAPPAPAPKAAAAQPAAASSSNAMVAVNGGASRAFDYSNANALQRDINSLGDEALGTRKRAAERLEAVIVGAEGEAAEATVRALTGDLFKPLLKRFADPGEK

What remains are tousands of entrys named "ConsensusfromContig1".

Is there any software or any script i can use to rename the headers in a way that they are 20 characters long and still able to get identified? I have only found scripts for truncating too long headers so far. The desired naming for the example would be something like 10000|3144/1451-1467:0 .

I would be grateful for any help provided.

fasta header renaming • 4.3k views
ADD COMMENT
2
Entering edit mode
7.0 years ago
Anima Mundi ★ 2.9k

In Python:

for line in open('input.fa'):
    if '>' in line:
        r_line = line[::-1]
        r_header = r_line[1:19]
        print '>' + r_header[::-1]
    else:
        print line,

 

 

ADD COMMENT
2
Entering edit mode
7.0 years ago
iraun ★ 4.3k

If you have always the same format of header line, I mean, always "Contig" word and "cds" word, you can use this awk command:

awk '{if($1 ~ /^>/){split($1,a,"-"); split(a[1],b,"Contig");split($1,c,"cds:"); print ">"b[2]"|"c[2]}else{print}}' file
ADD COMMENT
0
Entering edit mode

Thanks a lot! Never imagined it could be done so easy. I used ur command in the following way:

awk '{if($1 ~ /^>/){split($1,a,"-"); split(a[1],b,"Contig");split($1,c,"cds:"); print ">"b[2]"|"c[2]}else{print}}' Cyanophora_paradoxa_MAKER_gene_predictions-022111-aa.fasta >> Cyanophora_paradoxa_MAKER_gene_predictions-022111-aa-newHeaders.fasta


It worked like a charm. Big thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 570 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6