Question

How to read SwissProt file with regexps in Python

0

Entering edit mode

7.4 years ago

natasha.sernova ★ 4.0k

Dear all,

I've read the file with startswith and Biopython.

I don't need the whole file, only ID, AC, OC, KW, SQ-lines and a sequence itself.

But I was told I have to do it with regexps. I've spent a few days on it, and I see I cannot do it.

Please help me!

Many thanks!

Natasha

# startswith

import re
import random
import math
import sys
print "This is the name of the script: ", sys.argv[0]
print "Number of arguments: ", len(sys.argv)
print "The arguments are: " , str(sys.argv)

fin = open(sys.argv[1], 'r')
for line in fin:
    if line.startswith("AC"):
       print line
    elif line.startswith("DE"):  
      print line
    elif line.startswith("OC"):  
      print line
    elif line.startswith("KW"):  
      print line
    elif line.startswith("SQ"): 
        AA=list()
        AA = line.split()
        print "Seq_Length = "+AA[2]+AA[3]

    elif line.startswith("\/\/"):  
        break
fin.close()

import urllib
import re

#Biopython

from Bio import ExPASy
from Bio import SeqIO
handle = ExPASy.get_sprot_raw("P35579")
seq_record = SeqIO.read(handle, "swiss")
handle.close()
printseq_record.id)
printseq_record.name)
print(seq_record.description)
print(repr(seq_record.seq))
print("Length %i" % len(seq_record))
print(seq_record.annotations["keywords"])

fhand = urllib.urlopen('http://www.uniprot.org/uniprot/P35579.fasta')
for line in fhand:
    print re.sub(r'$[\n]','', line)    
#    print line
#    print re.sub(r'[\.]','!', line)

python regexp • 2.2k views

ADD COMMENT • link updated 7.4 years ago by WouterDeCoster 47k • written 7.4 years ago by natasha.sernova ★ 4.0k

0

Entering edit mode

I've found this refence, but exactly these solutions I've tried for a long time and failed.

http://stackoverflow.com/questions/6186938/python-how-to-use-regexp-on-file-line-by-line-in-python

ADD REPLY • link 7.4 years ago by natasha.sernova ★ 4.0k