Entering edit mode
                    8 months ago
        reza
        
    
        ▴
    
    300
    i have a list of gene names and a file (gpff format) including proteins sequences. i want to extract protein sequence from gpff format file for each gene. how can i do this?
A part of gpff format file
LOCUS       XP_031247110             372 aa            linear   PLN 22-OCT-2019
DEFINITION  GDSL esterase/lipase At4g16230-like [Pistacia vera].
ACCESSION   XP_031247110
VERSION     XP_031247110.1
DBLINK      BioProject: PRJNA578116
DBSOURCE    REFSEQ: accession XM_031391250.1
KEYWORDS    RefSeq; includes ab initio.
SOURCE      Pistacia vera
  ORGANISM  Pistacia vera
            Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;
            Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae;
            Pentapetalae; rosids; malvids; Sapindales; Anacardiaceae; Pistacia.
COMMENT     MODEL REFSEQ:  This record is predicted by automated computational
            analysis. This record is derived from a genomic sequence
            (NW_022196320.1) annotated using gene prediction method: Gnomon.
            Also see:
                Documentation of NCBI's Annotation Process
            ##Genome-Annotation-Data-START##
            Annotation Provider         :: NCBI
            Annotation Status           :: Full annotation
            Annotation Name             :: Pistacia vera Annotation Release 100
            Annotation Version          :: 100
            Annotation Pipeline         :: NCBI eukaryotic genome annotation
                                           pipeline
            Annotation Software Version :: 8.2
            Annotation Method           :: Best-placed RefSeq; Gnomon
            Features Annotated          :: Gene; mRNA; CDS; ncRNA
            ##Genome-Annotation-Data-END##
            ##RefSeq-Attributes-START##
            ab initio :: 8% of CDS bases
            ##RefSeq-Attributes-END##
            COMPLETENESS: full length.
FEATURES             Location/Qualifiers
     source          1..372
                     /organism="Pistacia vera"
                     /cultivar="Batoury"
                     /db_xref="taxon:55513"
                     /chromosome="Unknown"
                     /tissue_type="leaf"
                     /country="China"
     Protein         1..372
                     /product="GDSL esterase/lipase At4g16230-like"
                     /calculated_mol_wt=40507
     CDS             1..372
                     /gene="LOC116104818"
                     /coded_by="XM_031391250.1:1..1119"
                     /db_xref="GeneID:116104818"
ORIGIN      
        1 mtekiptkfl llcfpllaif fpcnvycwst ygsqikgmfv fgsslvdngn nnflltlaka
       61 nyspygvdfp ggpsgrftng mnvidllgee lqlpslipvf ydpstkggrt ivhgvnyasg
      121 gsgilndtgs iagnvvslne qirnfdevtl pelkthvdcr stdllhnylf vvgsggndys
      181 fnyfltqana nvsveaftdn linslsqqlk klyslggrkf vlmsvnplgc npvarasqpt
      241 gqdgciqvln qaahlfnsrl rltvdfirpq mpgstlvfvn sykiitdiig dpvsngfndt
      301 rkaccqvlsv neggngilck rggrvcaern ihvffdglhp teavniqiak kafgsynrde
      361 vypinvrqla kl