Processing DNA sequences in format as appears in US patents
0
0
Entering edit mode
7.3 years ago
rotem ▴ 10

I'm wondering whether biopython or a similar package can read DNA sequences in the weird format that appears in US patents (which is required by the USPTO!). It is not hard to write a script that reads this format, but since python's Bio.SeqIO already processes so many different formats, it would be great if it could also deal with this one.

Does anyone know the format I'm referring to and whether any packages already deal with it? This format resembles DDBJ / EMBL, but not exactly. It looks like:

<211> some number

<212> some entity

<213> organism

<400> serial number

10bp_block 10bp_block 10bp_block 10bp_block 10bp_block 10bp_block   60

10bp_block 10bp_block 10bp_block 10bp_block 10bp_block 10bp_block   120

etc.

Thanks! Rotem

USPTO dna sequence format biopython • 1.2k views
ADD COMMENT
0
Entering edit mode

USPTO makes PatentIn available for sequence submissions. Seems to be Windows only.

ADD REPLY

Login before adding your answer.

Traffic: 2863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6