7.0 years ago by
Seven questions in one - please ask more detailed specific questions if you need more advice, and/or sign up to the Biopython mailing list.
(1) Load EMBL records from a file.
Use Bio.SeqIO with format name "embl" to load EMBL sequence files, which will give you one SeqRecord object per record. As Istvan said, just edit those objects in memory by updating their attributes/properties, and then save them to disk using the Bio.SeqIO.write function.
(2) Update source information for the records like organism, project, mol_type
Update the qualifiers dictionary of the source feature SeqFeature object (typically the first feature of the record, which is a SeqRecord object).
(3) Update accession numbers
Probably just update the annotation dictionary and/or id of the record, depending which values exactly you are interested in.
(4) Update IDs
Probably just update the id attribute of the SeqRecord, i.e. set it to a new value - depending what you meant by ID.
(5) Update descriptions
Probably just update the description attribute of the SeqRecord, i.e. set it to a new value - depending what you meant by descriptions.
(6) Add references
Update the list of reference objects in the annotations dictionary of the SeqRecord, i.e. record.annotations["references"]
(7) Remove db_xref qualifiers in source and other features.
Each feature's db_xref attribute is a list which you can edit to remove an entry, or simple replace with an empty list.