Question: Update Biopython'S Seqrecords?
0
gravatar for biocyberman
7.0 years ago by
biocyberman810
Denmark
biocyberman810 wrote:

I looked at the documentation here: http://biopython.org/wiki/SeqRecord , but I did not find any information or methods to update properties of SeqRecords. I want to do following things:

  1. Load EMBL records from a file.
  2. Update source information for the records like organism, project, mol_type
  3. Update accession numbers
  4. Update IDs
  5. Update descriptions
  6. Add references
  7. Remove db_xref qualifiers in source and other features.

I can decompose each record and reconstruct it with update information by using init method like in here: http://biopython.org/DIST/docs/api/Bio.SeqRecord.SeqRecord-class.html However, I believe there must be a better way to do this. Would it be necessary to extend SeqRecord class for my purpose?

python biopython • 2.6k views
ADD COMMENTlink modified 7.0 years ago by Peter5.8k • written 7.0 years ago by biocyberman810
6
gravatar for Peter
7.0 years ago by
Peter5.8k
Scotland, UK
Peter5.8k wrote:

Seven questions in one - please ask more detailed specific questions if you need more advice, and/or sign up to the Biopython mailing list.

(1) Load EMBL records from a file.

Use Bio.SeqIO with format name "embl" to load EMBL sequence files, which will give you one SeqRecord object per record. As Istvan said, just edit those objects in memory by updating their attributes/properties, and then save them to disk using the Bio.SeqIO.write function.

(2) Update source information for the records like organism, project, mol_type

Update the qualifiers dictionary of the source feature SeqFeature object (typically the first feature of the record, which is a SeqRecord object).

(3) Update accession numbers

Probably just update the annotation dictionary and/or id of the record, depending which values exactly you are interested in.

(4) Update IDs

Probably just update the id attribute of the SeqRecord, i.e. set it to a new value - depending what you meant by ID.

(5) Update descriptions

Probably just update the description attribute of the SeqRecord, i.e. set it to a new value - depending what you meant by descriptions.

(6) Add references

Update the list of reference objects in the annotations dictionary of the SeqRecord, i.e. record.annotations["references"]

(7) Remove db_xref qualifiers in source and other features.

Each feature's db_xref attribute is a list which you can edit to remove an entry, or simple replace with an empty list.

ADD COMMENTlink written 7.0 years ago by Peter5.8k

Thanks for the advice, Peter. Actually you can count eight questions there :-P. Before posting these questions, somehow I can't even assign record.id to some new value. Now I tried again and it works. I am working on it and will join Biopython mailing list.

ADD REPLYlink written 7.0 years ago by biocyberman810
2
gravatar for Istvan Albert
7.0 years ago by
Istvan Albert ♦♦ 82k
University Park, USA
Istvan Albert ♦♦ 82k wrote:

I did a Genbank file transformation a little while ago and that involved mutating SeqRecords.

There is no update but you can replace the attributes with the correct classes.

There are some subtle dependencies however, for example sub features take precedence over the content of the SeqFeature. Meaning that if you update the feature but not the sub_features then the latter will overwrite the former when it is serialized back.

ADD COMMENTlink written 7.0 years ago by Istvan Albert ♦♦ 82k
2

The nasty sub_feature stuff only applied to features using join locations, and will be going away in Biopython 1.62 (next release).

ADD REPLYlink written 7.0 years ago by Peter5.8k

good to know - it was quite a head scratcher until I figured it out

ADD REPLYlink written 7.0 years ago by Istvan Albert ♦♦ 82k

This doesn't sound straightforward. But anyway, thanks for your answers.

ADD REPLYlink written 7.0 years ago by biocyberman810

some of these are only require to update a dictionary - but you are right in that the implementation does not lend itself to changing the attributes.

Another possibility would be to create a genbank text or xml file with the desired data and parse that with biopython. The file formats are more rigorously defined than the internal workings of the classes.

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by Istvan Albert ♦♦ 82k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1832 users visited in the last hour