Entering edit mode
7.1 years ago
Hi all!
I have a fasta file with the following sequence names:
>ITSPF14_7020;size=17;
TTTCCGTAGGTGAACCTGCGGAAGGATCATTACCACACCTTCGACGGCTGCTGCTGCGTGGCGGGCCCTATCACTGGCGAGCGT
TTGGGTCCCTCTCGGGGGAACTGAGCTAGTAGCCTCTCTTTTAAACCCATTCTGT..............
>ITSPF14_733;size=110;
TCGGAGTAAAATCTCGACGGCTGCTGCTGCGTGGCGGGCCCTATCACTGGCGAGCGT
TTGGGTCCCTCTCGGGGGAACTGAGCTAGTAGCCTCTCTTTTAAACCCATTCTGT...............
And I would like to add a label after the sequence name that already has, without removing it:
>ITSPF14_7020;size=17; DQ071354_1_Phytophthora_cactorum_strain_Shakuyaku1_1_beta_tubulin_gene_partial_cds
I have an excel file that relates each sequence ID (ITSPF14_7020) with their correspondent label
(DQ071354_1_Phytophthora_cactorum_strain_Shakuyaku1_1_beta_tubulin_gene_partial_cds)
Does anyone know how can I do this directly with a script ?? (and save me lot of time if I have to do it manually!)
Thanks a lot for your help!
Alexandra
Renaming fasta headers according to a matching name list
*linearize the fasta, * use awk to create a new column containing the accession number with awk
awk -F '[>;]' '{printf("%s\t%s\n",$2,$0);}'
A little search in biostars.org will allow you to find numerous examples for each steps...
what genomax2 said : Renaming fasta headers according to a matching name list