Question: Is there a way to do clustalo profile-profile alignment without adding gaps to first profile?
0
gravatar for DNAngel
20 months ago by
DNAngel30
DNAngel30 wrote:

Solved: I stopped using clustalo and went back to mafft (originally what I wanted to use) but it was unable to read my alignments. This was due to non-typical characters that were inserted in my exon alignments ("!" and "?"), but after converting those into dashes, mafft read the alignment properly and I was able to append my new species while maintaining the reading frame of my original MSA. For those who may need to do the same, I used (windows, cygwin):

$ mafft --addfull outgroup_species.fasta --keeplength prealigned_msa.fasta > combined_msa.fasta


I need to align a new sequence to a pre-existing multiple sequence alignment. I know how to run clustalo profile-profile alignment where I treat my one new sequence as a separate alignment. But everytime I run this process, the pre-existing MSA gets gaps added between columns but I need to avoid this as it is ruining my reading frame.

Is there an option to simply not alter the first profile alignment at all?

Sample of my pre-aligned MSA (if I were looking at the first 4 exons):

>sp1
----ATGCTC---ATAT
>sp2
----ATGGTC---ATAT
>sp3
CCAT---------ATAT   # These gaps are inserted to represent a missing exon
>sp4
CCATATGGTCCCC----   # Gaps needed to maintain the reading frame per exon

The sequence I want to add to the pre-aligned MSA (it has some extra bases that I show with () that need to be trimmed after aligned; all exons included as this is a reference sequence):

>outgroup 
CCATAT(T)GGTCCCCATAT(TCA)

Ideal output:

>sp1
----ATGCTC---ATAT
>sp2
----ATGGTC---ATAT
>sp3
CCAT---------ATAT   
>sp4
CCATATGGTCCCC---- 
>outgroup 
CCATATGGTCCCCATAT

Not sure how to align the new sequence while maintaining the length of the MSA, because if it does add columns it will mess up the reading frame. I cannot convert the bases to amino acids either because I will have to work in nucleotides for future dN/dS ratios.

ADD COMMENTlink modified 20 months ago • written 20 months ago by DNAngel30

Why have you asked the same question twice?

ADD REPLYlink written 20 months ago by jrj.healey11k

Desperate times. My original question led me to finding the clustalo answer, which still causes some problems for me. I deleted it anyways.

ADD REPLYlink modified 20 months ago • written 20 months ago by DNAngel30
0
gravatar for h.mon
20 months ago by
h.mon24k
Brazil
h.mon24k wrote:

Maybe mafft with --add and --keeplength will do what you want.

ADD COMMENTlink written 20 months ago by h.mon24k

That was a program I was using but it keeps giving me an error saying my first profile is unaligned. Which is untrue because it is aligned. There are gaps at the start and at the end but those are required (everything is aligned by exon first, then concatenated). I've emailed their helpdesk but have no heard back yet so I was looking into clustalo which seems to accept my profile as aligned. I really want to use MAFFT though for the keep length function...really hoping they will get back to me asap!

ADD REPLYlink written 20 months ago by DNAngel30

I'd suggest you show us some of your sample data (alignments)

ADD REPLYlink written 20 months ago by jrj.healey11k

Edited my main question to show an example. Even if the program can insert gap columns according to the new sequence (outgroup species) into the original MSA, the --keeplength would then ideally remove those columns at the end so it only edits the new sequence and maintains the MSA.

Mafft finds my MSA to be unaligned and I am thinking it is because some species just did not have exons returned from the next-gen analysis, so gaps were inserted to indicate a missing exon. I cannot realign the whole thing after because it aligns the wrong exons together at the end. :/

ADD REPLYlink written 20 months ago by DNAngel30

Are you trying to add an outgroup to an existing MSA?

ADD REPLYlink written 20 months ago by jrj.healey11k

Yes I need to align it to the MSA. It is just one species sequence.

ADD REPLYlink written 20 months ago by DNAngel30

I might be wrong, but I'm not sure its scientifically very robust to add the outgroup after the fact. It'd be akin to me doing an experiment control separately from the rest of the experiment. The parameters of your first and second bouts of alignment might not be the same, especially if you've had to use 2 different softwares to do it.

ADD REPLYlink written 20 months ago by jrj.healey11k

The first alignment were for study species (I did not run the tests the dataset was given to me after the fact to run some tests), but in order to run phylogenetic tests I would need an outgroup. I'd have no choice but to add it after extracting the exons for a specific gene for all the study species. :S

ADD REPLYlink written 20 months ago by DNAngel30

Ah I got it to work. It was due to ambiguous characters in the alignment that are produced after quality checks that was making mafft think the sequences were missing bases and thus, unaligned.

After converting all "!" symbols to "-" (and "?" to "-") the alignment was read properly and the sequence was added using mafft without changing the original MSA length!

ADD REPLYlink written 20 months ago by DNAngel30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 757 users visited in the last hour