I am trying to find if there is inversion happening between two bacterial samples. I did the denovo assembly for the samples and then compared the assemblies using Mummer.
Below is the graph that I got from Mummer. Please note that -r was used while running mummer so that only reverse complement matches are shown. The graph contains quite a few long lines with slope of -1 which means presence of an inverted segment of conservation between the two sequences. The exact mummer command used is as follows.
mummer -mum -F -r -c -l 100 ref.fasta query.fasta > output.mums
I manually checked the denovo assemblies for the regions highlighted by the mums output file and indeed reverse complement is happening.
Some of the reverse matches shown in mums output file are really big in size (some are about 330000 bp longs) and so the reverse match can not just be a coincidence. The pipeline I have followed to generate denovo assemblies is good one with only high quality reads selected for denovo.
Also just to add that reverse matches are are not complete contigs (otherwise reverse matches could simply be down to assembler outputting in reverse compliment for one of the contigs). For example below is a snippet of mummer output showing two reverse matches. As can be seen contig 2 in query sequence is of length 571230 bp but the reverse match is only of length 339098. Similarly the contig 7 in query sequence is of length 327356 bp but there are only two reverse match for it in reference sequence of lengths 42974 and 28669.
NODE_5_REF_SEQ_length_339168_cov_61.2129 71 339098 339098
NODE_25_REF_SEQ_length_71644_cov_64.1254 1 327356 42974
NODE_25_REF_SEQ_length_71644_cov_64.1254 42976 284381 28669
My questions are as follows:
The reverse matches are reverse complement matches (and not just reverse). So basically Reverse complement of a subsequence in sample 1 exactly matches with the corresponding subsequence in sample 2. My understanding is that inverse mutation means reverse complement and not reverse. Let me know if this is not correct.
In order to conclusive prove that there is indeed inverse mutations happening between two samples, is the output from Mummer sufficient (which is basically a mums file showing position in the reference sequence, the position in the query sequence, and the length of the match for each reverse match and a graph)? Of course we would again sequence to see if same behaviour is repeated but as far as current sequence data is concerned is there something else I can do to find our if inversion is present?
The next step will be to find the genes present in the reverse mutation regions but please let me know if there is some else that I should consider.
Thanks for reading,