I have a large fasta file of 16S sequences and I want to retrieve sequences using a list of organism names. Do you know a script capable of doing it?
Headers look like that:
>S000000859 Bacillus sp. USC14; AF346495
>S000001027 Paenibacillus borealis; KN25; AJ011325
And I have a list like the following:
Paenibacillus sp. 1-18
Paenibacillus sp. 1-49
Paenibacillus sp. A9
Paenibacillus sp. Aloe-11
I want to retrieve those sequences that match with names present in the list.