Question: Parsing Swissprot With Biopython
2
gravatar for Stephen
6.7 years ago by
Stephen30
Stephen30 wrote:

How can I parse the gene name from the SwissProt file "uniprot_sprot.dat" ?

I use the following code to parse the accessions:

from Bio import SwissProt
for record in SwissProt.parse(open("uniprot_sprot.dat")):
    for i in record.accessions:
         print i

That would print all the accession numbers. I also want the gene name

How do I access the other objects under record such as gene name? I tried gene_name but that doesn't seem to work. There doesn't seem to be a UML diagram in the Biopython tutorial for SwissProt parser like the BLAST parser....

biopython • 6.5k views
ADD COMMENTlink modified 15 months ago by Biostar ♦♦ 20 • written 6.7 years ago by Stephen30

(I tagged this with 'Biopython' as well)

ADD REPLYlink written 6.7 years ago by Peter5.8k
14
gravatar for a.zielezinski
6.7 years ago by
a.zielezinski8.8k
a.zielezinski8.8k wrote:

The gene_name attribute of the SwissProt Record object in BioPython works absolutely fine as follows:

from Bio import SwissProt
for record in SwissProt.parse(open("uniprot_sprot.dat")):
    print record.gene_name

Here is a list of other attribute names that hold SwissProt data. I copied it from the docstring of the SwissProt module.

Holds information from a SwissProt record.

Members:
entry_name        Name of this entry, e.g. RL1_ECOLI.
data_class        Either 'STANDARD' or 'PRELIMINARY'.
molecule_type     Type of molecule, 'PRT',
sequence_length   Number of residues.

accessions        List of the accession numbers, e.g. ['P00321']
created           A tuple of (date, release).
sequence_update   A tuple of (date, release).
annotation_update A tuple of (date, release).

description       Free-format description.
gene_name         Gene name.  See userman.txt for description.
organism          The source of the sequence.
organelle         The origin of the sequence.
organism_classification  The taxonomy classification.  List of strings.
                         (http://www.ncbi.nlm.nih.gov/Taxonomy/)
taxonomy_id       A list of NCBI taxonomy id's.
host_organism     A list of names of the hosts of a virus, if any.
host_taxonomy_id  A list of NCBI taxonomy id's of the hosts, if any.
references        List of Reference objects.
comments          List of strings.
cross_references  List of tuples (db, id1[, id2][, id3]).  See the docs.
keywords          List of the keywords.
features          List of tuples (key name, from, to, description).
                  from and to can be either integers for the residue
                  numbers, '<', '>', or '?'

seqinfo           tuple of (length, molecular weight, CRC32 value)
sequence          The sequence.
ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by a.zielezinski8.8k
3

With any python object at the python prompt try dir(pythonobject) which gives you a list of all the attributes etc. Also try help(pythonobject) which would give you the API docstrings also available here in this case: http://biopython.org/DIST/docs/api/Bio.SwissProt.Record-class.html

ADD REPLYlink written 6.7 years ago by Peter5.8k

Thank you for this helpful hint!

ADD REPLYlink written 6.7 years ago by Stephen30
0
gravatar for Stephen
6.7 years ago by
Stephen30
Stephen30 wrote:

Thank you for your helpful answer!

ADD COMMENTlink written 6.7 years ago by Stephen30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1777 users visited in the last hour