How To Reformat Msa Output To Show Only 50 Nucleotide Sequences In Each Line
2
1
Entering edit mode
12.9 years ago
Thaman ★ 3.3k

I have Multiple sequence alignment (clustal) file and I want to read this file and arrange nucleotide sequences in such a way that it looks more clear and precise in order.

I am doing this from biopython using AlignIO object. My codes goes like this:

alignment = AlignIO.read("opuntia.aln", "clustal")
print "Number of rows: %i" % len(align)
for record in alignment:
print "%s - %s" % record.id, record.seq)


I have been more fascinated by the output of http://www.ebi.ac.uk/Tools/clustalw2/ project because it looks clear.

My Output -- , it looks messy and long scrolling. What i want to do is print only 50 sequences (nucleotide) in each line and continue till the end of alignment file.

I wish to have output like this from http://www.ebi.ac.uk/Tools/clustalw2/.

Any suggestions, algorithm and sample code is appreciated

Here is the link to the input file-- http://pastebin.com/EaeKsyvg

Br,

clustalw multiple • 3.4k views
1
Entering edit mode

By '50 sequences', do you mean 50 nucleotides? @Thaman: I have seen a few of your questions and they are all great, in the sens that they have the potential to be very useful to others. However, it would be much better if you took the time to formulate them more clearly, giving both more details about the context and what you desire, and also making it more English. That way, you will get much better replies and won't have to ask the same question twice because the first version was confusing, as happened recently. Thanks for your contribution! Cheers.

0
Entering edit mode

Sequence ID and 50 sequences? That's not clear.

0
Entering edit mode
gi|6273285|gb|AF191659.1|AF191      TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA 50

0
Entering edit mode

I mean like this...........

ID                             Sequences
gi|6273285-----TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA

0
Entering edit mode
ID--------------------Sequence----------------------------------------
gi|6273285|gb|------TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA

0
Entering edit mode
ID------Sequence----------------------------------------
gi|6273285|gb|----TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAA
gi|6273284|gb|----TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAA
gi|6273287|gb-----TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAA

0
Entering edit mode

@Eric, thanks for your kind suggestion, i have edited my query and make it more precise. I am just trying to make alignment output similar to http://www.ebi.ac.uk/Tools/clustalw2/ project. According to my output its long nucleotide record, but i want to split it into 50 nucleotide each in individual line. Thanks for listening

0
Entering edit mode

Could you please put the input data on the web. That way we can write a code and test it withouth having to generate the input data manually. Just paste it into http://pastebin.com/ and link it.

0
Entering edit mode

Could you please put the input data (alignment) on the web. That way we can write a code and test it without having to generate the input data manually. Just paste it into pastebin.com and link it

0
Entering edit mode

How can i link it to Istan Albert?

0
Entering edit mode

just edit the post and add the link into the main post

0
Entering edit mode

Ok now finally alignment file is here.

0
Entering edit mode

okei i provided the input file link in the main post.

0
Entering edit mode

The input file is the file you want to transform, the one that has long lines. The file that you provide seems like the result of the transformation.

0
Entering edit mode

I tried with readlines() and seems working, seems likes i couldn't make my problem clear to you.

5
Entering edit mode
12.9 years ago

Here is my solution with Python, using the formatter by nuin. It transforms data from the original format shown here to the a visually more pleasing format shown here

import string
from itertools import islice

# remove empty lines if present
lines = map(string.strip, lines)
lines = filter(None, lines)

# break each line into id and sequence pairs
pairs = map(string.split, lines)

# this breaks each sequence into the small pieces
# generates each individual line in the file
def chunk(pair, length=50):
seqid, sequence = pair
temp = []
for j in range(0,len(sequence),length):
temp.append( (seqid + ' ' + sequence[j:j+length]) )
return temp

# here we apply the chunking function
data = map(chunk, pairs)

# this is really an idiomatic expression to perform
# the inverse of what zip usually does
transposed = zip( *data )

# generate the output
for elems in transposed:
print '\n'.join(elems)
print


The only tricky part is the zip(*data), to best understand what it does see:

a = [ (1,2,3), (4,5,6) ]
b = zip(*a)
c = zip(*b)

print a
print b
print c


it will print:

[(1, 2, 3), (4, 5, 6)]
[(1, 4), (2, 5), (3, 6)]
[(1, 2, 3), (4, 5, 6)]

0
Entering edit mode

@ Istvan, how can i get clustal multiple sequence alignment scores in each line like on the right side column of

0
Entering edit mode

it that alignment score or simply the step count, 50, 100, 150 etc. You see this is why it is important to give people access to the actual files for both the input and output, otherwise we are left guessing

0
Entering edit mode

I am trying to make project using Clustalw2(multiple sequence alignment) biopython wrapper. Its somehow similar to http://www.ebi.ac.uk/Tools/ -> sequence analysis-> clustalw. Fasta Input file for example is (http://pastebin.com/JA76QTJG) and output alignment file is (http://pastebin.com/M4KQX46u). Now i want to read the alignment file and show (id, sequence and scores) records. I'm not sure whether its count or alignment score.

2
Entering edit mode
12.9 years ago
Paulo Nuin ★ 3.7k

This is a function that you can use to do that

def format_output(sequence, length):
temp = []
for j in range(0,len(sequence),length):
temp.append(sequence[j:j+length])
return '\n'.join(temp)


just loop your sequences through it.

0
Entering edit mode

Really nice approach! Thx :)

0
Entering edit mode

Not as Pythonic as it could be, but simple to understand.

Traffic: 1377 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.