Update Biopython'S Seqrecords?
2
0
Entering edit mode
11.2 years ago
biocyberman ▴ 860

I looked at the documentation here: http://biopython.org/wiki/SeqRecord , but I did not find any information or methods to update properties of SeqRecords. I want to do following things:

  1. Load EMBL records from a file.
  2. Update source information for the records like organism, project, mol_type
  3. Update accession numbers
  4. Update IDs
  5. Update descriptions
  6. Add references
  7. Remove db_xref qualifiers in source and other features.

I can decompose each record and reconstruct it with update information by using init method like in here: http://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html However, I believe there must be a better way to do this. Would it be necessary to extend SeqRecord class for my purpose?

biopython python • 3.8k views
ADD COMMENT
6
Entering edit mode
11.2 years ago
Peter 6.0k

Seven questions in one - please ask more detailed specific questions if you need more advice, and/or sign up to the Biopython mailing list.

(1) Load EMBL records from a file.

Use Bio.SeqIO with format name "embl" to load EMBL sequence files, which will give you one SeqRecord object per record. As Istvan said, just edit those objects in memory by updating their attributes/properties, and then save them to disk using the Bio.SeqIO.write function.

(2) Update source information for the records like organism, project, mol_type

Update the qualifiers dictionary of the source feature SeqFeature object (typically the first feature of the record, which is a SeqRecord object).

(3) Update accession numbers

Probably just update the annotation dictionary and/or id of the record, depending which values exactly you are interested in.

(4) Update IDs

Probably just update the id attribute of the SeqRecord, i.e. set it to a new value - depending what you meant by ID.

(5) Update descriptions

Probably just update the description attribute of the SeqRecord, i.e. set it to a new value - depending what you meant by descriptions.

(6) Add references

Update the list of reference objects in the annotations dictionary of the SeqRecord, i.e. record.annotations["references"]

(7) Remove db_xref qualifiers in source and other features.

Each feature's db_xref attribute is a list which you can edit to remove an entry, or simple replace with an empty list.

ADD COMMENT
0
Entering edit mode

Thanks for the advice, Peter. Actually you can count eight questions there :-P. Before posting these questions, somehow I can't even assign record.id to some new value. Now I tried again and it works. I am working on it and will join Biopython mailing list.

ADD REPLY
2
Entering edit mode
11.2 years ago

I did a Genbank file transformation a little while ago and that involved mutating SeqRecords.

There is no update but you can replace the attributes with the correct classes.

There are some subtle dependencies however, for example sub features take precedence over the content of the SeqFeature. Meaning that if you update the feature but not the sub_features then the latter will overwrite the former when it is serialized back.

ADD COMMENT
2
Entering edit mode

The nasty sub_feature stuff only applied to features using join locations, and will be going away in Biopython 1.62 (next release).

ADD REPLY
0
Entering edit mode

good to know - it was quite a head scratcher until I figured it out

ADD REPLY
0
Entering edit mode

This doesn't sound straightforward. But anyway, thanks for your answers.

ADD REPLY
0
Entering edit mode

some of these are only require to update a dictionary - but you are right in that the implementation does not lend itself to changing the attributes.

Another possibility would be to create a genbank text or xml file with the desired data and parse that with biopython. The file formats are more rigorously defined than the internal workings of the classes.

ADD REPLY

Login before adding your answer.

Traffic: 2567 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6