Hi everyone,
im encountering a problem with too long fasta headers. They get truncated at the 20th position by a program (TargetP) im using.
Example:
>ConsensusfromContig10000-snap_masked-ConsensusfromContig10000-abinit-gene-0.1-mRNA-1:cds:3144/1451-1467:0:+ MKKSGDIDEIWKSMQEDARPKPRLPPLPAAAPPAPAPPAPAPKAAAAQPAAASSSNAMVAVNGGASRAFDYSNANALQRDINSLGDEALGTRKRAAERLEAVIVGAEGEAAEATVRALTGDLFKPLLKRFADPGEK
What remains are tousands of entrys named "ConsensusfromContig1".
Is there any software or any script i can use to rename the headers in a way that they are 20 characters long and still able to get identified? I have only found scripts for truncating too long headers so far. The desired naming for the example would be something like 10000|3144/1451-1467:0 .
I would be grateful for any help provided.
Thanks a lot! Never imagined it could be done so easy. I used ur command in the following way:
awk '{if($1 ~ /^>/){split($1,a,"-"); split(a[1],b,"Contig");split($1,c,"cds:"); print ">"b[2]"|"c[2]}else{print}}' Cyanophora_paradoxa_MAKER_gene_predictions-022111-aa.fasta >> Cyanophora_paradoxa_MAKER_gene_predictions-022111-aa-newHeaders.fasta
It worked like a charm. Big thanks again!