Making All Protein Sequence Lengths Same
1
1
Entering edit mode
13.0 years ago
Ashru2006 ▴ 10

Is there any code in perl / python to make all protein sequences of same length, otherwise my phylogenetic tool MEGA is not working on them ?

perl bioperl biopython python • 3.9k views
ADD COMMENT
1
Entering edit mode

I liked the way Jan described "manual editing" as one of the more sophisticated things you can do;-)

ADD REPLY
0
Entering edit mode

Have you aligned your sequences? You need to in order to do phylogeny and almost any aligner will pad out sequences to make them the same length.

ADD REPLY
0
Entering edit mode

I would guess that they should be globally-aligned and therefore of the same length. Check your alignment?

ADD REPLY
0
Entering edit mode

David W, I've heard that there are programs that construct alignment and phylogeny in the same time. But I think they are not implemented in MEGA

ADD REPLY
0
Entering edit mode

It happened to me several times to have alignments with sequences of different lengths. These can arise in several situations when you do more sophisticated things, e.g. manual edits, combining several alignments into one using some sequence alignment editors etc. The fact that most of the programs and programming libraries complain about unequal sequence lengths, instead of correcting that automatically, is for me one of the biggest annoyances in bioinformatics.

ADD REPLY
1
Entering edit mode
13.0 years ago
Jan Kosinski ★ 1.6k

If you don't mind using interactive programs instead of Python/Perl, you can try with Jalview (cross-platform) or BioEdit (Windows)

Jalview: Menu edit-> Pad Gaps

BioEdit: Menu Alignment -> Flush the alignment

But of course, as others suggested in the comments, ensure first you have your sequences aligned, and different lengths arose for different reasons.

ADD COMMENT

Login before adding your answer.

Traffic: 2507 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6