Question: Most efficient way to trim overhanging bases after alignment
0
gravatar for joreamayarom
2.6 years ago by
joreamayarom110
USA/Cambridge
joreamayarom110 wrote:

I have used muscle to align a newly assembled genome to a reference genome. The seqs.afa file looks like this:

>reference
--------------------------ACTGAC
ACTGACTGACTGACTGACTGACTGACTGACTG
... Lots of bases here .........
ACTGACTGACTGACTGACTGACTGACTG----
-----------------

>my_assembly
AAAAAAAAAAAAAAAAAAAAAAAAAAACTGAC
ACTGACTGACTGACTGACTGACTGACTGACTG
... Lots of bases here .........
ACTGACTGACTGACTGACTGACTGACTGAAAA
AAAAAAAAAAAAAAAAA

As you can see, my program has a tendency to leave dangling bases downstream and upstream the reference genome. and I need to get rid of them in post processing. Is there a program or simple Python script I can use to trim the bases that overhang from the reference. How to do it efficiently? (I have a considerable amount of data sets)

muscle fasta • 761 views
ADD COMMENTlink modified 2.5 years ago by Suzanne50 • written 2.6 years ago by joreamayarom110
1
gravatar for 5heikki
2.6 years ago by
5heikki8.4k
Finland
5heikki8.4k wrote:

Get inspired by this

ADD COMMENTlink written 2.6 years ago by 5heikki8.4k

I adapted the script and it doesn't work. Sometimes, if one of the aligned sequences has a relatively big gap in the middle, everything downstream is eliminated.

ADD REPLYlink written 2.6 years ago by joreamayarom110

Is your input data in the same format as OP's of that post? No linebreaks in sequences?

ADD REPLYlink written 2.6 years ago by 5heikki8.4k
0
gravatar for shenwei356
2.6 years ago by
shenwei3564.5k
China
shenwei3564.5k wrote:

trimal (http://trimal.cgenomics.org/publications) can trim multiple sequence alignment results

ADD COMMENTlink written 2.6 years ago by shenwei3564.5k
0
gravatar for Suzanne
2.5 years ago by
Suzanne50
Dundee, Scotland
Suzanne50 wrote:

Jalview (www.jalview.org) is a good visualisation workbench for multiple sequence alignments. Along with several useful editing features (check out the youtube video), it has pad gaps feature that can be toggled on and off. When selected, the alignment will be kept at a minimal width (so there are no empty columns before or after the first or last aligned residue) and all sequences will be padded with gap characters before and after their terminating residues. The pad gaps feature is demonstrated at 3.20min in this video. The sequence can then be exported in a variety of file formats.

ADD COMMENTlink written 2.5 years ago by Suzanne50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 707 users visited in the last hour