Hello !
I am doing genome sequence alignment using MUMmer, in particular I want to do a dotplot with mummerplot. So the passages that I did are:
create a file .mums with the following command line:
mummer -mum -b -c H_pylori26695_Eslice.fasta H_pyloriJ99_Eslice.fasta > mummer.mums
and then plot all the MUMs with MUMmerplot:
mummerplot -x "[0,275287]" -y "[0,265111]" -postscript -p mummer mummer.mums
The example is taken here. The plot works, but I would like to understand several things to have more consciousness of what I am doing: the file .mums has three columns with many values. What do they represent ? I know that it is a little explained in the link given before:
This command will find all maximal unique matches (-mum) between the reference and query on both the forward and reverse strands (-b) and report all the match positions relative to the forward strand (-c). Output is to stdout, so we will redirect it into a file named mummer.mums. This file lists all of the MUMs of the default length or greater between the two input sequences.
So the file should report all the Maximal Unique Matches between the two sequences but I do not understand what are exactly these three columns...
In the example given before (that is here) you can find that file (called mummer.mums). Since the file begins with the name of the query sequence, maybe the first two columns are the position of the match between the two sequences and the third is the length of this match ?
But why at a certain point you have the name with "reverse" if we have said to report all the match positions relative to the forward strand (-c) ? !
Thank you in advance .