Taxonomy from Pfam database
0
0
Entering edit mode
4.4 years ago
shibl_a ▴ 20

I have a fasta file with >1000 sequences as a result of an hmmsearch against the pfam database. I would like to extract the lineage information (i.e. from phylum to genus or species) of each of those sequences and place it in the header (maybe instead of the current header). For example;

Currently, my file looks like this:

>C7MPV8.1/10-262 [subseq from] C7MPV8_SACVD Transcriptional regulator, IclR family {ECO:0000313|EMBL:ACU96353.1}
HHVQSLERGLAVIKAFGAGAPQLTLSEVAKETGLTRAAARRFLLTLADLGYVRSDGRYFS
LTAKVLELGYAYLSSLSLPEVVQPHLERLSAEVHESCSVSVLDGTDIVYVARVAVSRIMT
VSINVGTRFPAHATSMGRVLLAGLDEDALADYLREVTFDRLTDHTITSAEKLRAELDAVR
EQGWALVDQELEEGLRSVAAPIRDRSGKVVAAVNISTHASRTTPESVRTGLVPPLLATAA
RIESDLAVAPAAQ
>A0A0A6QCE9.1/8-271 [subseq from] A0A0A6QCE9_9BURK IclR family transcriptional regulator {ECO:0000313|EMBL:KHD20547.1}
SHAPSDEQPESHEKPGDSYVQSFARGLSVIRAFNAERPEQTLTDVAAATGLTRAGARRIL
LTLQTLGYVEAEGRLFRLTPKILDLGFAYLTSMPFWNLAEPVMEQLSAEVHESCSAAVLD
RTEIVYVLRVPTHKIMTINLSIGSRLPAYCTSMGRVLLSALDEPTLDATLGAMPLYAHTP
RTVTDKDELKKIIAQVRQQGWSIIDQELEGGLISIAAPIRNRQGRVIAAMNISGNAQRTS
AKQMVKAFLEPLLQAAQRVSEMVA

I would like it to look like this:

    >Phylum, Class, Order, Family, Genus
    HHVQSLERGLAVIKAFGAGAPQLTLSEVAKETGLTRAAARRFLLTLADLGYVRSDGRYFS
    LTAKVLELGYAYLSSLSLPEVVQPHLERLSAEVHESCSVSVLDGTDIVYVARVAVSRIMT
    VSINVGTRFPAHATSMGRVLLAGLDEDALADYLREVTFDRLTDHTITSAEKLRAELDAVR
    EQGWALVDQELEEGLRSVAAPIRDRSGKVVAAVNISTHASRTTPESVRTGLVPPLLATAA
    RIESDLAVAPAAQ
    >Phylum, Class, Order, Family, Genus
    SHAPSDEQPESHEKPGDSYVQSFARGLSVIRAFNAERPEQTLTDVAAATGLTRAGARRIL
    LTLQTLGYVEAEGRLFRLTPKILDLGFAYLTSMPFWNLAEPVMEQLSAEVHESCSAAVLD
    RTEIVYVLRVPTHKIMTINLSIGSRLPAYCTSMGRVLLSALDEPTLDATLGAMPLYAHTP
    RTVTDKDELKKIIAQVRQQGWSIIDQELEGGLISIAAPIRNRQGRVIAAMNISGNAQRTS
    AKQMVKAFLEPLLQAAQRVSEMVA

Thanks in advance!

taxonomy pfam • 699 views
ADD COMMENT

Login before adding your answer.

Traffic: 2037 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6