I have two files (see below for the actual format): a fasta file with > 7000 sequences and a .txt file consisting of two columns. The first column in the .txt file corresponds with the name in the fasta file (minus the tail ';size=') and the second column gives the total number of sequences corresponding with that name. Now, I would like to add this size information for each sequence to the back of the headers in the fasta file of that same sequence. In other words: I would like to get the number '6047' which corresponds to ZOTU1 in the fasta file like '>Zotu1;size=6047'. The ZOTU's in the text file are not sorted.
I have no clue how to go about this so any pointing in the right direction would be extremely appreciated!
1) the fasta file looks like this:
>Zotu1;size= AGCTCCAAAAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAACTTCTGTTCAGGTTCATTTCGACTCGTC GAGTGAAACTGGACATACGTTTGCAAACTAAAATCGGCCTTCACTGGTTCGTCTTAGGGAGTAAACATTTTACTGTGAAA AAATTAGAGTGTTCCAGGCAGGTTTTAGCCCGAATACATTAGCATGGAATAATGGAATAGGACTAAGTCCATTTTATTGG TTCTTGGATTTGGTAATGATTAATAGGGGCAGTTGGGGGCATTAGTATTTAATAGTCAGAGGTGAAATTCTTGGATTTAT TAAGGACTAACTAATGCGAAAGCATTTGCCAAAGATGTTTTCA >Zotu2;size= AGCTCCAATAGCGTATATTTAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGGATCTTGGGTCGGGGGCAGCGGTCCGCCCC TTGTGGGTGTGCACTGGTCCACCCGGCCTTACTGCCGGGGACGCGCTCCTGGCCTTCGCTGGTCGGGACGCGGAGTTGGC GATGTTACTTTGAAAAAATTAGAGTGCTCAAAGCAAGCCTATGCTCTGAATACATTAGCATGGAATAACGTGATAGGACT ...
2) the .txt file looks like this:
Zotu1 604 Zotu566 1023 Zotu6785 31 Zotu6 111453 Zotu69 10380 Zotu223 3706 Zotu215 2559 Zotu2697 109 Zotu3 211288 Zotu742 697