Question: How to read SwissProt file with regexps in Python
gravatar for natasha.sernova
3.9 years ago by
natasha.sernova3.7k wrote:

Dear all,

I've read the file with startswith and Biopython.

I don't need the whole file, only ID, AC, OC, KW, SQ-lines and a sequence itself.

But I was told I have to do it with regexps. I've spent a few days on it, and I see I cannot do it.

Please help me!

Many thanks!


# startswith

import re
import random
import math
import sys
print "This is the name of the script: ", sys.argv[0]
print "Number of arguments: ", len(sys.argv)
print "The arguments are: " , str(sys.argv)

fin = open(sys.argv[1], 'r')
for line in fin:
    if line.startswith("AC"):
       print line
    elif line.startswith("DE"):  
      print line
    elif line.startswith("OC"):  
      print line
    elif line.startswith("KW"):  
      print line
    elif line.startswith("SQ"): 
        AA = line.split()
        print "Seq_Length = "+AA[2]+AA[3]

    elif line.startswith("\/\/"):  

import urllib
import re


from Bio import ExPASy
from Bio import SeqIO
handle = ExPASy.get_sprot_raw("P35579")
seq_record =, "swiss")
print("Length %i" % len(seq_record))

fhand = urllib.urlopen('')
for line in fhand:
    print re.sub(r'$[\n]','', line)    
#    print line
#    print re.sub(r'[\.]','!', line)
regexp python • 1.4k views
ADD COMMENTlink modified 3.9 years ago by WouterDeCoster44k • written 3.9 years ago by natasha.sernova3.7k

I've found this refence, but exactly these solutions I've tried for a long time and failed.

ADD REPLYlink written 3.9 years ago by natasha.sernova3.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1852 users visited in the last hour