FASTA and PIR formats in python
1
1
Entering edit mode
9.2 years ago
Moses ▴ 150

Hi,

I'm suffering from different sequence formatting problem (FASTA and PIR) , basically I'm using MODELLER and its functionality in my biopython scripts. Biopython deals with FASTA format, whereas to build a comparative model MODELLER uses PIR file to make use of structural information. I'm having a hard time to deal with this two formats. what I tried to do is first I obtain two sequences in FASTA format and then do

aln.append(file = 'file.fasta', align_codes='all', alignment_format='FASTA')

then after that I did:

aln.write(file='5fd1_1fdx_output.fasta', alignment_format='FASTA')
aln.write(file='5fd1_1fdx_ouput.pir', alignment_format = 'PIR')

and used the latter (5fd1_1fdx_ouput.pir) to build the model. but it's not working since I'm losing information whenever I convert from FASTA to PIR.

So the input FASTA format file is(5fd1_1fdx_sequence.fasta):

>5fd1
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER

>1fdx
AYVINDSCIACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED-----------------
-------------------------------

and the output file (5fd1_1fdx_ouput.pir):

>P1;5fd1
sequence::     : :     : :::-1.00:-1.00
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

>P1;1fdx
sequence::     : :     : :::-1.00:-1.00
AYVINDSCIACG--ACKPECPVN-IIQG-SIYAIDADSCIDCGSCASVCPVGA----------------------
-------------PNPED-------------*

I need a way in python or biopython to convert between these two file formats and not losing information. it is important that the output in the PIR file to be in this form:

>P1;5fd1
structureX:5fd1:1    :A:106  :A:ferredoxin:Azotobacter vinelandii: 1.90: 0.19
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

>P1;1fdx
sequence:1fdx:1    : :54   : :ferredoxin:Peptococcus aerogenes: 2.00:-1.00
AYVINDSC--IACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED-----------------
-------------------------------*

As you can see information is lost in the second line for each sequence. Does anyone know how to convert between these formats without loosing information? Thank you.

biopython python modeller • 5.1k views
ADD COMMENT
0
Entering edit mode
9.2 years ago
Ram 43k

Substitute the first new line encountered after a > with ;PIR=( and the second new line with )\n to get FASTA.

Substitute other way around to get PIR from FASTA. If you're trying to create PIR from exported FASTA, I'm sorry, that's not possible.

Also, BioPython deals with PIR as well. Check out http://biopython.org/DIST/docs/api/Bio.SeqIO.PirIO-module.html

ADD COMMENT
1
Entering edit mode

Biopython don't support modeller -pir format actually (at least to write it) the link points to EBI format which is substantialy different from the format of MODELLER even if they share the name.

ADD REPLY
1
Entering edit mode

How I hate when this happens! I remember this being the case of BED formats as well - one a tab separate plain text, the other a binary file. Don't use duplicate names, people! </rant>

ADD REPLY
0
Entering edit mode

why it's not possible to create PIR format from an exported FASTA?

ADD REPLY
0
Entering edit mode

It's been more than 4 years, so I might be losing context here, but it looks like exported FASTA has less information content than the PIR, which is probably why I said it was not possible. Technically, it might be possible but the PIR may end up with a lot of blank fields.

ADD REPLY

Login before adding your answer.

Traffic: 1948 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6