I'm trying to align corresponding chromosomes (~100 Mbp long) of two dog breeds:
- Labrador retriever (https://www.ncbi.nlm.nih.gov/nuccore/CM025100.1?report=fasta)
- German shepherd (https://www.ncbi.nlm.nih.gov/nuccore/CM021962.1?report=fasta)
This is for a nonscientific purpose, so alignment quality is not a priority and fast execution is preferred.
I used the program lastz with the following script:
lastz german_shepherd.fasta labrador.fasta \
--notransition --step=20 --nogapped \
--format=maf --ambiguous=iupac \
> shepard_labrador.maf
This is analogous to the first example in the tutorial: https://lastz.github.io/lastz/ -- only I use it for two closely related sequences. The script takes very long to run and the file it generates reached up to ~80GB before I had to terminate the process. The sequences themselves are ~100 MB in size.
Upon examining the output, I saw that the script created many overlapping aligned segments. For example, overlapping segments [0:146], [0:231] appear in two different alignments.
Does anyone know how I can enforce the identified segments to not overlap? It works fine with the example of the chicken and human chromosomes in the tutorial, but with two closely related sequences I get all these overlaps that takes forever to process and the output is ridiculously large and ambiguous.
Thanks for your time.