Pairwise alignment of two long chromosomes (~100Mbp) with lastz
0
0
Entering edit mode
3 days ago

I'm trying to align corresponding chromosomes (~100 Mbp long) of two dog breeds:

This is for a nonscientific purpose, so alignment quality is not a priority and fast execution is preferred.

I used the program lastz with the following script:

lastz german_shepherd.fasta labrador.fasta \
--notransition --step=20 --nogapped \
--format=maf --ambiguous=iupac \


This is analogous to the first example in the tutorial: https://lastz.github.io/lastz/ -- only I use it for two closely related sequences. The script takes very long to run and the file it generates reached up to ~80GB before I had to terminate the process. The sequences themselves are ~100 MB in size.

Upon examining the output, I saw that the script created many overlapping aligned segments. For example, overlapping segments [0:146], [0:231] appear in two different alignments.

Does anyone know how I can enforce the identified segments to not overlap? It works fine with the example of the chicken and human chromosomes in the tutorial, but with two closely related sequences I get all these overlaps that takes forever to process and the output is ridiculously large and ambiguous.

long aligning lastz • 131 views