Hi,
I have a collection of 200 bacterial genome sequences with each file corresponding to separate bacterial species. The names of the sequence files are the corresponding bacterial species (e.g Nostoc punctiforme.fasta and 199 more). The sequences are downloaded from multiple databases so they have different header formats.
I am working on some perl scripts which can append my identifier ( e.g.NOPU ) to each fasta entry of the corresponding genome (So that each entry of the Nostoc punctiforme.fasta has NOPU appended after >) i am able to append and modify long identifiers through these scripts which asks for the file name and the identifier . In order to automate this process i have created a text file with two column, the first column for Filename and second column for identifier and each line having new entry e.g
**Filename** **My identifier**
Nostoc punctiforme NOPU
Prochlorococcus marinus PRMA
and 198 more entries
the idea behind this is to write a script which asks for a file name, searches the file name as well as identifier from the text file and appends the corresponding identifier to each entry of the fasta file and makes a new output file with the same name as the identifier or a script that asks for a directory of fasta file and picks fasta files one by one, look up the text file for identifier and do the needful.
However i am failing to have any lead in this direction and request people out there to help me.
Thanks
Hi! I tried this using a tab separated file called samples.txt and I'm getting the following error: NameError: name 'file' is not defined
I'm not sure what's wrong?