Is there a way to do clustalo profile-profile alignment without adding gaps to first profile?
1
0
Entering edit mode
5.5 years ago
DNAngel ▴ 240

Solved: I stopped using clustalo and went back to mafft (originally what I wanted to use) but it was unable to read my alignments. This was due to non-typical characters that were inserted in my exon alignments ("!" and "?"), but after converting those into dashes, mafft read the alignment properly and I was able to append my new species while maintaining the reading frame of my original MSA. For those who may need to do the same, I used (windows, cygwin):

\$ mafft --addfull outgroup_species.fasta --keeplength prealigned_msa.fasta > combined_msa.fasta

I need to align a new sequence to a pre-existing multiple sequence alignment. I know how to run clustalo profile-profile alignment where I treat my one new sequence as a separate alignment. But everytime I run this process, the pre-existing MSA gets gaps added between columns but I need to avoid this as it is ruining my reading frame.

Is there an option to simply not alter the first profile alignment at all?

Sample of my pre-aligned MSA (if I were looking at the first 4 exons):

>sp1
----ATGCTC---ATAT
>sp2
----ATGGTC---ATAT
>sp3
CCAT---------ATAT   # These gaps are inserted to represent a missing exon
>sp4
CCATATGGTCCCC----   # Gaps needed to maintain the reading frame per exon


The sequence I want to add to the pre-aligned MSA (it has some extra bases that I show with () that need to be trimmed after aligned; all exons included as this is a reference sequence):

>outgroup
CCATAT(T)GGTCCCCATAT(TCA)


Ideal output:

>sp1
----ATGCTC---ATAT
>sp2
----ATGGTC---ATAT
>sp3
CCAT---------ATAT
>sp4
CCATATGGTCCCC----
>outgroup
CCATATGGTCCCCATAT


Not sure how to align the new sequence while maintaining the length of the MSA, because if it does add columns it will mess up the reading frame. I cannot convert the bases to amino acids either because I will have to work in nucleotides for future dN/dS ratios.

profile-profile alignment clustalo • 2.3k views
0
Entering edit mode

Why have you asked the same question twice?

0
Entering edit mode

Desperate times. My original question led me to finding the clustalo answer, which still causes some problems for me. I deleted it anyways.

0
Entering edit mode
5.5 years ago
h.mon 34k

Maybe mafft with --add and --keeplength will do what you want.

0
Entering edit mode

That was a program I was using but it keeps giving me an error saying my first profile is unaligned. Which is untrue because it is aligned. There are gaps at the start and at the end but those are required (everything is aligned by exon first, then concatenated). I've emailed their helpdesk but have no heard back yet so I was looking into clustalo which seems to accept my profile as aligned. I really want to use MAFFT though for the keep length function...really hoping they will get back to me asap!

0
Entering edit mode

I'd suggest you show us some of your sample data (alignments)

0
Entering edit mode

Edited my main question to show an example. Even if the program can insert gap columns according to the new sequence (outgroup species) into the original MSA, the --keeplength would then ideally remove those columns at the end so it only edits the new sequence and maintains the MSA.

Mafft finds my MSA to be unaligned and I am thinking it is because some species just did not have exons returned from the next-gen analysis, so gaps were inserted to indicate a missing exon. I cannot realign the whole thing after because it aligns the wrong exons together at the end. :/

0
Entering edit mode

Are you trying to add an outgroup to an existing MSA?

0
Entering edit mode

Yes I need to align it to the MSA. It is just one species sequence.

0
Entering edit mode

I might be wrong, but I'm not sure its scientifically very robust to add the outgroup after the fact. It'd be akin to me doing an experiment control separately from the rest of the experiment. The parameters of your first and second bouts of alignment might not be the same, especially if you've had to use 2 different softwares to do it.

0
Entering edit mode

The first alignment were for study species (I did not run the tests the dataset was given to me after the fact to run some tests), but in order to run phylogenetic tests I would need an outgroup. I'd have no choice but to add it after extracting the exons for a specific gene for all the study species. :S

0
Entering edit mode

Ah I got it to work. It was due to ambiguous characters in the alignment that are produced after quality checks that was making mafft think the sequences were missing bases and thus, unaligned.

After converting all "!" symbols to "-" (and "?" to "-") the alignment was read properly and the sequence was added using mafft without changing the original MSA length!