Question: FASTA and PIR formats in python
1
gravatar for Moses
5.8 years ago by
Moses120
united states/ Bloomingtion/ Indiana University Bloomington
Moses120 wrote:

Hi,

I'm suffering from different sequence formatting problem (FASTA and PIR) , basically I'm using MODELLER and its functionality in my biopython scripts. Biopython deals with FASTA format, whereas to build a comparative model  MODELLER uses PIR file to make use of structural information. I'm having a hard time to deal with this two formats. what I tried to do is first I obtain two sequences in FASTA format and then do 

aln.append(file = 'file.fasta', align_codes='all', alignment_format='FASTA')

then after that I did:

aln.write(file='5fd1_1fdx_output.fasta', alignment_format='FASTA')
aln.write(file='5fd1_1fdx_ouput.pir', alignment_format = 'PIR')

and used the latter (5fd1_1fdx_ouput.pir ) to build the model. but it's not working since I'm loosing information whenever I convert from FASTA to PIR.

so the input FASTA format file is(5fd1_1fdx_sequence.fasta):

>5fd1
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER

>1fdx
AYVINDSCIACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED-----------------
-------------------------------

and the output file (5fd1_1fdx_ouput.pir):

>P1;5fd1
sequence::     : :     : :::-1.00:-1.00
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

>P1;1fdx
sequence::     : :     : :::-1.00:-1.00
AYVINDSCIACG--ACKPECPVN-IIQG-SIYAIDADSCIDCGSCASVCPVGA----------------------
-------------PNPED-------------*

I need a way in python or biopython to convert between these two file formats and not loosing information. it is important that the output in the PIR file to be in this form:

>P1;5fd1
structureX:5fd1:1    :A:106  :A:ferredoxin:Azotobacter vinelandii: 1.90: 0.19
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

>P1;1fdx
sequence:1fdx:1    : :54   : :ferredoxin:Peptococcus aerogenes: 2.00:-1.00
AYVINDSC--IACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED-----------------
-------------------------------*

as you can see information is lost in the second line for each sequence. Does anyone know how to convert between these formats without loosing information? thank you.

modeller biopython python • 3.7k views
ADD COMMENTlink modified 5.8 years ago by _r_am31k • written 5.8 years ago by Moses120
0
gravatar for _r_am
5.8 years ago by
_r_am31k
Baylor College of Medicine, Houston, TX
_r_am31k wrote:

Substitute the first new line encountered after a > with ;PIR=( and the second new line with )\n to get FASTA.

Substitute other way around to get PIR from FASTA. If you're trying to create PIR from exported FASTA, I'm sorry, that's not possible.

Also, BioPython deals with PIR as well. Check out http://biopython.org/DIST/docs/api/Bio.SeqIO.PirIO-module.html

 

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by _r_am31k
1

Biopython don't support modeller -pir format actually (at least to write it) the link points to EBI format which is substantialy different from the format of MODELLER even if they share the name.

ADD REPLYlink written 4.7 years ago by Lluís R.970
1

How I hate when this happens! I remember this being the case of BED formats as well - one a tab separate plain text, the other a binary file. Don't use duplicate names, people! </rant>

ADD REPLYlink written 4.7 years ago by _r_am31k

why it's not possible to create PIR format from an exported FASTA?

ADD REPLYlink written 15 months ago by islemhabibi30

It's been more than 4 years, so I might be losing context here, but it looks like exported FASTA has less information content than the PIR, which is probably why I said it was not possible. Technically, it might be possible but the PIR may end up with a lot of blank fields.

ADD REPLYlink written 15 months ago by _r_am31k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1059 users visited in the last hour