Parsing Swissprot With Biopython
2
4
Entering edit mode
11.4 years ago
Stephen ▴ 50

How can I parse the gene name from the SwissProt file "uniprot_sprot.dat" ?

I use the following code to parse the accessions:

from Bio import SwissProt
for record in SwissProt.parse(open("uniprot_sprot.dat")):
    for i in record.accessions:
         print i

That would print all the accession numbers. I also want the gene name

How do I access the other objects under record such as gene name? I tried gene_name but that doesn't seem to work. There doesn't seem to be a UML diagram in the Biopython tutorial for SwissProt parser like the BLAST parser....

biopython • 9.8k views
ADD COMMENT
0
Entering edit mode

(I tagged this with 'Biopython' as well)

ADD REPLY
15
Entering edit mode
11.4 years ago

The gene_name attribute of the SwissProt Record object in BioPython works absolutely fine as follows:

from Bio import SwissProt
for record in SwissProt.parse(open("uniprot_sprot.dat")):
    print record.gene_name

Here is a list of other attribute names that hold SwissProt data. I copied it from the docstring of the SwissProt module.

Holds information from a SwissProt record.

Members:
entry_name        Name of this entry, e.g. RL1_ECOLI.
data_class        Either 'STANDARD' or 'PRELIMINARY'.
molecule_type     Type of molecule, 'PRT',
sequence_length   Number of residues.

accessions        List of the accession numbers, e.g. ['P00321']
created           A tuple of (date, release).
sequence_update   A tuple of (date, release).
annotation_update A tuple of (date, release).

description       Free-format description.
gene_name         Gene name.  See userman.txt for description.
organism          The source of the sequence.
organelle         The origin of the sequence.
organism_classification  The taxonomy classification.  List of strings.
                         (http://www.ncbi.nlm.nih.gov/Taxonomy/)
taxonomy_id       A list of NCBI taxonomy id's.
host_organism     A list of names of the hosts of a virus, if any.
host_taxonomy_id  A list of NCBI taxonomy id's of the hosts, if any.
references        List of Reference objects.
comments          List of strings.
cross_references  List of tuples (db, id1[, id2][, id3]).  See the docs.
keywords          List of the keywords.
features          List of tuples (key name, from, to, description).
                  from and to can be either integers for the residue
                  numbers, '<', '>', or '?'

seqinfo           tuple of (length, molecular weight, CRC32 value)
sequence          The sequence.
ADD COMMENT
3
Entering edit mode

With any python object at the python prompt try dir(pythonobject) which gives you a list of all the attributes etc. Also try help(pythonobject) which would give you the API docstrings also available here in this case: http://biopython.org/DIST/docs/api/Bio.SwissProt.Record-class.html

ADD REPLY
0
Entering edit mode

Thank you for this helpful hint!

ADD REPLY
0
Entering edit mode
11.4 years ago
Stephen ▴ 50

Thank you for your helpful answer!

ADD COMMENT

Login before adding your answer.

Traffic: 1698 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6