Dear BioStar Community,
I am analyzing a dataset of bacterial proteins structured as follows:
This dataset has about 15000 entries. What I am trying to do is extract all proteins according to their annotations (ENZ, MEM, unknown, etc.) to perform sequence/structure analysis on them. I am using a list comprehension to do this as follows (ENZ annotation shown here):
def retrieve_enzyme_proteins(filename): with open(filename) as file: for line in file: # search for the proteins annotated as enzymes (ENZ) if line='>' and '_ENZ' in line: # I need a line here telling the function to jump to the next line and extract the sequence return [line.strip('\n') for line in file if line != '>']
I have only been able to extract EVERY protein sequence so far and would like a logical structure that tells my program to look for the desired annotation (e.g. ENZ) and then jump to the next line where its sequence begins and extract that sequence? Apologies if the indentation is messed up, the copy-paste does that. I am using Python 2.7.3. Any help would be much appreciated!