Question: Adding sequences to preexisting MAF alignment
2.9 years ago
Carla wrote:


I was wondering what is best and If what I am doing so far is wrong.

First, I was considering following the lastz + multiz pipeline that seems to be the best/more accepted approach. However, I am having issues after performing the primary pairwise alignment between the reference genome and my sequences of interest. The reference genome that I am using is the same as the one used for the multi-species alignment to which I want to align my sequences afterwards. I've used lastz to do the pairwise alignment.

The question is, how can I add the pairwise alignment to the already constructed alignment with all the species? When I see how Roast and Multiz work, it seems that they use each pairwise alignment of all the species to be able to construct the full alignment following a tree. It doesn't feel like is the best approach for what I am trying to accomplish. Is there another way of doing this without having to split the multi-species alignment into pairs?

Second, I was considering using MAFFT to directly align my sequences to the full MAF alignment with all the species. However, the alignment is splitted into chromosomes. Is it better to do it like that? or better to concatenate all the chromosomes, convert it into FASTA and then feed it into MAFFT?

The end goal is to get the conservation scores of my sequences and the other species.

Thanks in advance!

sequence alignment genome
Hi Carla, could you resolve this? I am trying the same as you I think, I downloaded the 100 vertebrate genome form UCSC and I am trying to obtain that alignment as multifastas so I can later add another set of cds form another animal. I was looking then to search for orthologous between animal 101 and the 100 multiz dataset. Any instructions would be welcome. Thanks.

Hi Carla, I am facing a similar issue. Do you find a good solution?

2.9 years ago
Istvan Albert
University Park, USA
Istvan Albert wrote:

You have too many questions here - in general, I would say that most tools I have used align all sequences at the same time rather than building them up from already existing pairwise alignments. I think this latter approach has fallen into disuse.

As to merging chromosomes - you'd be creating a non-existing chimeric sequence - and, while that might be useful in some cases it would also introduce all kinds of unintended consequences when it comes to interpretation of the said sequences.

I apologize for such a convoluted post.

Yes, I agree that I shouldn't merge the chromosomes because It can introduce noise. Thank you for your advice!

I should have specified that it is whole-genome alignments to which I am trying to align my sequences (similar to the ucsc vertebrate alignments).

These alignments have been generated using lastz + multiz. My end goal is to align sequences to this preexisting alignment and calculate the conservation scores using phast.

At the end what I am doing is splitting the whole genome alignment into pairwise alignments by chromosomes. Then I will stitch them back together along with my sequences of interest (previously aligned using lastz with the same reference genome) by using multiz.

I am considering aligning the full genome of my species of interest and then getting the conservation scores for the sequences I am interested because It is essentially the same pipeline and I might get better alignments. Do you agree?

Another possibility that I was considering is that If I have my sequences already aligned to the same reference genome I also have the locations (it's a MAF file) so maybe I can concatenate my pairwise alignment with the multi-species alignment and sort it accordingly. Do you think this is possible?

Thanks again!

I will say it is hard to recommend one approach over the other - there are a few missing ingredients - for example, when "stitch them back together" - if you have the skills to stitch them back then, by all means, go ahead with it :-)

I think you should try what seems to easiest first and see what problems pop up along the way - then reasses.

Alright! I am in the process of finding out what works best, Thanks! :)

I thought that maybe there was an easier and faster approach considering that many people might want to add a genome to the UCSC vertebrate-multiz-alignment without having to un-stitch/re-stitch the full alignments. I think Galaxy might have a tool for this, but I am unsure on what's the protocol.

Hi Carla, I am facing a similar issue with a recently sequenced species and I would like to add it to a preexisting UCSC alignment. Have you tried any of the options? Which one worked best for you? Thanks!

Could you do it? I am in the same situtation and can't turn a UCSC 100 genome alignment into multifastas or something manageable and these old questions are [almost] the only reference to the format..

