How can I get a rdotplot file in lastz?
1
0
Entering edit mode
4.8 years ago
cristian ▴ 310

Hi,

I have the sequence for the OP50 strain of E. coli and I would like to compare it to the reference E. coli sequence. Does anybody know how I can do this? I am trying lastz:

lastz chr1.fa ecoli.fa --notransition --step=20 --nogapped --format=maf --rdotplot=rplot.txt > AvsB.maf

However, I do not know how to read the .maf file and the rplot.txt file is empty.

Thanks.

Best, C.

lastz r rdotplot dotplot genomic alignment • 2.7k views
1
Entering edit mode
4.8 years ago
h.mon 34k

First of all, I don't think this dotplot is useful, as the OP50 draft is fragmented and contains 2939 short contigs.

I have no idea of why you aren't getting the rdotplot file. I just replicated your command with the OP50 draft and the K-12 substr. MG1655, and got an alignment and rdotplot file. I had first to replace the first | by _, otherwise the contig names would be mangled by lastz (even then I would get a rdotplot file):

sed -i.bak "s/\|/_/" op50.fasta

lastz k12.fasta op50.fasta --notransition --step=20 --nogapped \
--format=maf --rdotplot=rplot.txt > AvsB_a.maf


head rplot.txt outputs:

U00096.3    gi_269207760
1609417 1
1611892 2476
NA  NA
U00096.3    gi_269207759
344760  1
345348  589
NA  NA
U00096.3    gi_269207758

0
Entering edit mode

Hi h.mon,

To clarify, I am aligning a contig that I think might be OP50 but I am not sure. My fasta file anyway only contains one sequence.

grep '>' output/genome/ecoli/ref/seq/chr1.fa
>Chromosome

grep '>' output/genome/ecoli/op50/assembly/hgap/bristol/ecoli.fa
>000001F|arrow

0
Entering edit mode

I have reprocued what you did:

head output/rplot.txt
U00096.3    gi_269207760
1609417 1
1611892 2476
NA  NA
344760  1
345348  589
NA  NA
270620  1
271639  1020
NA  NA


Do you think it does not work when there is only one contig in each file? or maybe they are not similar enough?

C.

0
Entering edit mode

Do you think it does not work when there is only one contig in each file?

lastz works fine with single or multiple sequences per file, this is not the issue. The dot plot, on the other hand, is intended for alignments between single sequences. Maybe your op50 contig is corrupted or has something that bothers last?

The maf output is a simple text file, you can examine it with less, head, tail to see if it looks correct.

You may also try Mauve to align both genomes, it has a GUI and is quite easy to use.