Hi, I have a fasta file with 300 protein sequences. I intend to construct a phylogenetic tree with it. I would want only the accession number and the organism name in the fasta header and remove the rest of the information. Can anybody suggest how to do this? I have a linux based system with perl and python installed.
For example, i want to convert a header like this:
>gi|685204428|gb|AIN98665.1| fumarate hydratase, putative [Leishmania panamensis]
to a header like this
>Leishmania panamensis| AIN98665.1
Some sequences have multiple headers. Would that be a problem?