I have a Swiss-Prot database file that contains several Swiss-Prot Files.
They are copied and pasted underneath each other.
Therefore there is one Swiss-Prot entry after another listed in the same file.
I want to write the ID into another file as the header. Immediately underneath, I want to write the amino acid sequence.
So far I can only read one single Swiss-Prot file and get as output 1ID and 1 amino acid sequence. In other words, I have managed to print out the ID header first and the amino acid sequence second .
How can this code work to read multiple Swiss-Prot file entries from one single file?
How do I do this sequentially for every ID and amino acid sequence from each Swiss-Prot entry listed in the file?
bright_cyan = "\033[0;96m"
bright_yellow = "\033[0;33m"
bright_green = "\033[0;32m"
reset = "\033[0m"
#--------------------------------------------------------------------
import sys
import re  
#--------------------------------------------------------------------
def read_data(SPROT_FILE):
    ''' This function is what is is aint it '''
    flag = ''
    try:
        DNAfile = open(SPROT_FILE , 'r')
    except IOError as error:
        print(bright_cyan + "double check and see if you entered the correct filename :> ", str(error))
        sys.exit(1)
    # create a FASTA file to copy the information to and write. 
    new_outfile = open("first.fsa", 'w')
    amino_acid_sequence  = ''
    for line in DNAfile:
        #print(line, end = '')
        if re.match(r'ID', line):
            ID = line[5:20]     
        # Stateful Parsing of the amino acid sequence.  
        if re.match(r'//', line):
            flag = False
        if flag:
            amino_acid_sequence += line
        if re.match(r'SQ', line):
            flag = True
        # Find the modified amino acid residue. 
        if re.match(r'FT   MOD_RES', line):
            FT = line
            position_switch = ','.join(re.findall(r'\d+',FT))
            header_line = '>'+ID.strip()+" phospho:"+position_switch
            print(header_line)
            #print('>'+ID.strip()+" phospho:"+position_switch, file = new_outfile)
    # Print each amino acid sequence outside of the loop.
    amino_acid_sequence = amino_acid_sequence.replace(' ', '')
    print(amino_acid_sequence)
    # Write the amino acid sequence to the file. 
    print(amino_acid_sequence, file = new_outfile)
    DNAfile.close()
    new_outfile.close() 
# Not sure about this part...
files = input(bright_yellow + 'Type possibly filenames :> ').split()
for filename in files:
    read_data(filename)
I hope the question is clear.
Would be great it if you could offer some help.
Thanks in advance
Can you give an example of what you mean by swiss-prot file? I think you're describing a fasta file with amino acids. In that case use BioPython to parse the fasta file.
Yes. Here is an example of the file. It is a few thousand lines long so I won't put the whole thing.
Hopefully that is clearer now.
Best