Question: FASTA and PIR formats in python
1
gravatar for Moses
4.3 years ago by
Moses60
united states/ Bloomingtion/ Indiana University Bloomington
Moses60 wrote:

Hi,

I'm suffering from different sequence formatting problem (FASTA and PIR) , basically I'm using MODELLER and its functionality in my biopython scripts. Biopython deals with FASTA format, whereas to build a comparative model  MODELLER uses PIR file to make use of structural information. I'm having a hard time to deal with this two formats. what I tried to do is first I obtain two sequences in FASTA format and then do 

aln.append(file = 'file.fasta', align_codes='all', alignment_format='FASTA')

then after that I did:

aln.write(file='5fd1_1fdx_output.fasta', alignment_format='FASTA')
aln.write(file='5fd1_1fdx_ouput.pir', alignment_format = 'PIR')

and used the latter (5fd1_1fdx_ouput.pir ) to build the model. but it's not working since I'm loosing information whenever I convert from FASTA to PIR.

so the input FASTA format file is(5fd1_1fdx_sequence.fasta):

>5fd1
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER

>1fdx
AYVINDSCIACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED-----------------
-------------------------------

and the output file (5fd1_1fdx_ouput.pir):

>P1;5fd1
sequence::     : :     : :::-1.00:-1.00
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

>P1;1fdx
sequence::     : :     : :::-1.00:-1.00
AYVINDSCIACG--ACKPECPVN-IIQG-SIYAIDADSCIDCGSCASVCPVGA----------------------
-------------PNPED-------------*

I need a way in python or biopython to convert between these two file formats and not loosing information. it is important that the output in the PIR file to be in this form:

>P1;5fd1
structureX:5fd1:1    :A:106  :A:ferredoxin:Azotobacter vinelandii: 1.90: 0.19
AFVVTDNCIKCKYTDCVEVCPVDCFYEGPNFLVIHPDECIDCALCEPECPAQAIFSEDEVPEDMQEFIQLNAELA
EVWPNITEKKDPLPDAEDWDGVKGKLQHLER*

>P1;1fdx
sequence:1fdx:1    : :54   : :ferredoxin:Peptococcus aerogenes: 2.00:-1.00
AYVINDSC--IACGACKPECPVNIIQGS--IYAIDADSCIDCGSCASVCPVGAPNPED-----------------
-------------------------------*

as you can see information is lost in the second line for each sequence. Does anyone know how to convert between these formats without loosing information? thank you.

modeller biopython python • 2.8k views
ADD COMMENTlink modified 4.3 years ago by RamRS21k • written 4.3 years ago by Moses60
0
gravatar for RamRS
4.3 years ago by
RamRS21k
Houston, TX
RamRS21k wrote:

Substitute the first new line encountered after a > with ;PIR=( and the second new line with )\n to get FASTA.

Substitute other way around to get PIR from FASTA. If you're trying to create PIR from exported FASTA, I'm sorry, that's not possible.

Also, BioPython deals with PIR as well. Check out http://biopython.org/DIST/docs/api/Bio.SeqIO.PirIO-module.html

 

ADD COMMENTlink modified 4.3 years ago • written 4.3 years ago by RamRS21k
1

Biopython don't support modeller -pir format actually (at least to write it) the link points to EBI format which is substantialy different from the format of MODELLER even if they share the name.

ADD REPLYlink written 3.2 years ago by Lluís R.830
1

How I hate when this happens! I remember this being the case of BED formats as well - one a tab separate plain text, the other a binary file. Don't use duplicate names, people! </rant>

ADD REPLYlink written 3.2 years ago by RamRS21k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 690 users visited in the last hour