how can i use python in multiple sequence alignments?
1
0
Entering edit mode
5.5 years ago
sallyzaki70 ▴ 10

please,iam a python beginner i want to use it to align multiple alignments and i want to run this code

from glob import glob

for filename in glob('*.fas')
    with open (filename) as f:
        output=str(filename)
        output+='-alignment.fas'
        in_file = str(filename)
        from Bio.Align.Application.import mafftcommandline
        mafft_cline = mafftcommandline(input=in_file)
        print(mafft_cline)
        stdout,stderr=mafft_cline()
        with open (output,'w') as handle:
        handle.write(stdout)
sequencing alignment • 7.9k views
ADD COMMENT
0
Entering edit mode

Please could you format your code using the 101010 icon. Especially for python code

code_formatting

ADD REPLY
0
Entering edit mode

You just want to invoke MAFFT on a bunch of sequences via python?

Why do you want to do this via python specifically?

ADD REPLY
0
Entering edit mode

i want to use python to allign alot of unaligned sequences like

1:-

>X05053.1-3_118
CCTGGCGGCCATGGCGAACCGGAACCACCCGATCCCATCTCGAACTCGGAAGTGAAACGG
TTCAGCGCCGATGATAGTGTGGGGCCTCCCCATGTGAAAGTAGGTCACTGCCAGGC
>M36159.1-2_116
CAGGTGGTGATGGCGGAAAGGTCACACCCGAACACATCCCGAACTCGGAAGTTAAGCTTT
CCAGCGCCGATGGTAGTTGGGGGTTTCCCCCTGCGAGAGTAGGACGTTGCCGGGC
>Z50737.1-3_119
AACGGCGGTCATAGCGGTGGGGAAACGCCCGGTCCCATCCCGAACCCGGAAGCTAAGCCC
ACCAGCGCCGATGGTACTGCACTCGTGAGGGTGTGGGAGAGTAGGACGCCGCCGGAC
>X02631.1-2_115
GCTGGCGACCATAGCAAGAGTGAACCACCTGATCCCTTCCCGAACTCAGAAGTGAAACCT
CTTCGCGCTGATGGTAGTGNGGGTTACCCATGTGAGAGTAAGTCATCGCCAGCT
>Z50057.1-2_118
GTCGGTGGTCATTGCGGAGGGGGAACGCCCGGTCCCATCCCGAACCCGGAAGCTAAGCCC
TCCAGCGCCGATGGTACTGCACTCGCCAGGGTGTGGGAGAGTAGGTCGCCGCCGACA

2-

>M76579.1-4_118
CCTGGTGGCCATTGCGAGGGCCCTACACCCGATCCCTTCCCGAACTCGGCCGTGAAATCC
CTCAGCGCCTATGATACTGCACCTCAAGGTGCGGAAAAGTCGGTCGCCGCCAGGT
>AF114035.1-6057_6173
TTCGGTGGTCATAGCGTGAGGGAAACGCCCGGTTACATTCCGAACCCGGAAGCTAAGCCT
CAGAGCGCCGATGGTACTGCAGGGGGGACCCTGTGGGAGAGTAGGACGCCGCCGAAC
>X02250.1-3_118
TCTGGTGATAATAGCATTGTGGAACCACCTGATCCCATCCCGAACTCAGAAGTGAAACGC
AATTGCGCCGATGGTAGTGTGGGGTCTCCCCATGTGAGAGTAGGTCATTGCCAGGC
>X02237.1-3_118
CCTGGCGACCATAGTGTTTTGGACCCACCTGATTCCATTCCGAACTCAGAAGTGAAACGA
AACAGCGTCGATGGTAGTGTGGGGTTTCCCCATGTGAGAGTAGAACATCGCCAGGC
>M76388.1-6142_6258
TTCGGTGGTCATTGCGTTAGGGAAACGCCCGGTTACATTCCGAACCCGGAAGCTAAGCCT
TTCAGCGCCGATGGTACTTCAGGGGGGACCCTGTGGGAGAGTAGGACGCCGCCGAAC

3-

>M35566.1-3_117
CCTGACGACCACAGCGACTGTGAACCACCCGACCCCATCTCGAACTCGGTAGTGAAACCA
GTCAGCGCCGATGATAGTGTGGCATATGCCATGTGAAAGTAGGTCATCGTCAGGC
>X03902.1-1_114
CTGGTGGCCTGAGCGGTGTGCCCAGAACCCGATCCCATCTCGAACTCGGCCGTTAAACAC
ACCAGCGCCCATGGTACTGTGTCTCAAGACACGGGAGAGTCGGTGCCGCCAGGC
>U18089.1-3485_3600
TCTGGCGGCCATAGCGCAGTGGAACCACCCCTTCCCATCTCGAACAGGACCGTGAAACGC
TGCAGCGCCTATGATAGTTGAGGGTCTCCCTCCGCGAAAGTCGGTCACCGCCAGAC
>AF116561.1-5902_6018
TTCGGTGGTTATAGCGGTGGGGAAACACCCGGTCCCATTCCGAACCCGGTAGTTAAGCCC
GCCAGCGCCGATGGTACTGCACTGGTGACGGTGTGGGAGAGTAGGTCGCCGCCGGAC
>M16171.1-3_119
TCCGGTGGTGATAGCGAGAGGGAAACGCCCGGTGAGATTCCGAACCCGGAAGCTAAGCCT
CTCAGCGCCGATGGTACTGCAAGGGGGACCTTGTGGGAGAGTAGGACGCCGCCGGAC
ADD REPLY
0
Entering edit mode

What is your error ?

Please do your import at the begining of your script from Bio.Align.Application.import mafftcommandline

ADD REPLY
0
Entering edit mode
5.5 years ago
Joe 21k

You can do this with Python, though I don't see a particularly compelling reason to over just running MAFFT at the commandline natively.

import sys
from Bio.Align.Applications import MafftCommandline
import tempfile

files = sys.argv[1:]
lines = []

for file in files:
    with open(file, 'r') as ifh:
        lines.append(ifh.read())

with tempfile.NamedTemporaryFile() as temp:
    temp.write('\n'.join(lines))
    temp.seek(0)
    mafft_cline = MafftCommandline(input=temp.name)
    stdout,stderr=mafft_cline()

print stdout

Run this as:

python mafft_alignments.py *.fas

on the commandline.

This uses a temporary file for MAFFT so that you can concatenate all the sequences from your input files, without having to worry about intermediate filehandles etc. This might be an issue if your files are very big though (depends how much memory you have).

python mafft_alignment.py A.fasta B.fasta

gives me:

>RandomSequence_bhKRyVJoNyY4GralWOtVXRs9NWgLuDzS
gctacggta-gttagtgacccaggg------ccgagggcttccccgaactaaacacaatt
atcataatttggtccactcccgtgttc
>RandomSequence_dVyOdIlHB1I29BLYaVvjVIInwXbxldXU
---agggcatcttagtgtaccgcgacactacctaaagggtcgcttattttttgcccggtt
gtgaacagtaggcgcattgttgg----

For the input data:

$ cat A.fasta

>RandomSequence_bhKRyVJoNyY4GralWOtVXRs9NWgLuDzS
GCTACGGTAGTTAGTGACCCAGGGCCGAGGGCTTCCCCGAACTAAACACAATTATCATAATTTGGTCCACTCCCGTGTTC

$ cat B.fasta

>RandomSequence_dVyOdIlHB1I29BLYaVvjVIInwXbxldXU
AGGGCATCTTAGTGTACCGCGACACTACCTAAAGGGTCGCTTATTTTTTGCCCGGTTGTGAACAGTAGGCGCATTGTTGG

Personally, I'd concatenate the file with cat first, and then run mafft directly from the binary at the commandline though...

ADD COMMENT

Login before adding your answer.

Traffic: 2633 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6