Entering edit mode
4.8 years ago
shibl_a
▴
20
I have a fasta file with >1000 sequences as a result of an hmmsearch against the pfam database. I would like to extract the lineage information (i.e. from phylum to genus or species) of each of those sequences and place it in the header (maybe instead of the current header). For example;
Currently, my file looks like this:
>C7MPV8.1/10-262 [subseq from] C7MPV8_SACVD Transcriptional regulator, IclR family {ECO:0000313|EMBL:ACU96353.1}
HHVQSLERGLAVIKAFGAGAPQLTLSEVAKETGLTRAAARRFLLTLADLGYVRSDGRYFS
LTAKVLELGYAYLSSLSLPEVVQPHLERLSAEVHESCSVSVLDGTDIVYVARVAVSRIMT
VSINVGTRFPAHATSMGRVLLAGLDEDALADYLREVTFDRLTDHTITSAEKLRAELDAVR
EQGWALVDQELEEGLRSVAAPIRDRSGKVVAAVNISTHASRTTPESVRTGLVPPLLATAA
RIESDLAVAPAAQ
>A0A0A6QCE9.1/8-271 [subseq from] A0A0A6QCE9_9BURK IclR family transcriptional regulator {ECO:0000313|EMBL:KHD20547.1}
SHAPSDEQPESHEKPGDSYVQSFARGLSVIRAFNAERPEQTLTDVAAATGLTRAGARRIL
LTLQTLGYVEAEGRLFRLTPKILDLGFAYLTSMPFWNLAEPVMEQLSAEVHESCSAAVLD
RTEIVYVLRVPTHKIMTINLSIGSRLPAYCTSMGRVLLSALDEPTLDATLGAMPLYAHTP
RTVTDKDELKKIIAQVRQQGWSIIDQELEGGLISIAAPIRNRQGRVIAAMNISGNAQRTS
AKQMVKAFLEPLLQAAQRVSEMVA
I would like it to look like this:
>Phylum, Class, Order, Family, Genus
HHVQSLERGLAVIKAFGAGAPQLTLSEVAKETGLTRAAARRFLLTLADLGYVRSDGRYFS
LTAKVLELGYAYLSSLSLPEVVQPHLERLSAEVHESCSVSVLDGTDIVYVARVAVSRIMT
VSINVGTRFPAHATSMGRVLLAGLDEDALADYLREVTFDRLTDHTITSAEKLRAELDAVR
EQGWALVDQELEEGLRSVAAPIRDRSGKVVAAVNISTHASRTTPESVRTGLVPPLLATAA
RIESDLAVAPAAQ
>Phylum, Class, Order, Family, Genus
SHAPSDEQPESHEKPGDSYVQSFARGLSVIRAFNAERPEQTLTDVAAATGLTRAGARRIL
LTLQTLGYVEAEGRLFRLTPKILDLGFAYLTSMPFWNLAEPVMEQLSAEVHESCSAAVLD
RTEIVYVLRVPTHKIMTINLSIGSRLPAYCTSMGRVLLSALDEPTLDATLGAMPLYAHTP
RTVTDKDELKKIIAQVRQQGWSIIDQELEGGLISIAAPIRNRQGRVIAAMNISGNAQRTS
AKQMVKAFLEPLLQAAQRVSEMVA
Thanks in advance!