Entering edit mode
5.5 years ago
Gritz122
•
0
Hi,
I am new to regular expressions and programming in general and I need help with reformatting FASTA files from
>AB000667.1.2469.3417 root;Eukaryota;Chordata;Actinopteri;Pleuronectiformes;Paralichthyidae;Paralichthys;Paralichthys olivaceus
CAAAGGCTTGGTCCTGACTTTACTGTCGACTCTAACTAGACTTACACATGCAAGTATCCG
CCCCCCTGTGAGAATGCCCATAACGCCCTGCTCGGGAACAAGGAGCTGGCATCAGGCACA
...
some of them have:
>AB002132.1._1.403
in the header
and I want to change it to:
>AccessionNumber.Version SpeciesName
nucleotide sequence
nucleotide sequence, continued
nucleotide sequence, continued
etc...
The requirments include:
- The sequence name is just the Accession number followed by the version number
- This is followed by a space and then the species name. There should be a space between the genus and species name
Thanks!
What would I have to change the regular expression to if I just need the
Thanks so much, huge help!
If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.