Question

MUMmer alignment result interpretation

0

Entering edit mode

4.0 years ago

AP ▴ 80

Hello Biostars,

I am doing whole genome alignment using NUCmer (a program under MUMmer). I am using this alignment to separate core and accessory chromosome. From NUCmer alignment I generated delta files which I filtered using options -r and -g and generated the coordinate file. This coordinate file looks like this:

[S1]     [E1]  |     [S2]     [E2]  |  [LEN 1]  [LEN 2]  |  [% IDY]  |  [LEN R]  [LEN Q]  |  [COV R]  [COV Q]  | [TAGS]
===============================================================================================================================
       3     1062  |     2882     3943  |     1060     1062  |    87.29  |    47164    22944  |     2.25     4.63  | sca_100_unmapped  scaffold_479
       3     1046  |     2196     3240  |     1044     1045  |    88.52  |    47164   201231  |     2.21     0.52  | sca_100_unmapped   scaffold_68
    2091     2303  |    24338    24550  |      213      213  |    88.02  |    47164    27763  |     0.45     0.77  | sca_100_unmapped   scaffold_442
    9756    11454  |   108083   106395  |     1699     1689  |    93.47  |    47164   181231  |     3.60     0.93  | sca_100_unmapped   scaffold_81
   13817    15198  |    54353    55731  |     1382     1379  |    87.49  |    47164   146674  |     2.93     0.94  | sca_100_unmapped   scaffold_110
   46400    46664  |     7992     7731  |      265      262  |    84.27  |    47164    30552  |     0.56     0.86  | sca_100_unmapped   scaffold_418
    2236     3032  |    64822    65618  |      797      797  |    83.71  |    46409    72978  |     1.72     1.09  | sca_101_unmapped   scaffold_232
    2239     3578  |    21278    19939  |     1340     1340  |    79.63  |    46409    28656  |     2.89     4.68  | sca_101_unmapped   scaffold_438
   11309    11945  |    41233    40596  |      637      638  |    85.76  |    46409    48260  |     1.37     1.32  | sca_101_unmapped   scaffold_316
   12138    12918  |    40117    39337  |      781      781  |    86.04  |    46409    48260  |     1.68     1.62  | sca_101_unmapped   scaffold_316
   12840    16991  |   198620   202766  |     4152     4147  |    85.95  |    46409   284610  |     8.95     1.46  | sca_101_unmapped   scaffold_48
   24138    24287  |    48814    48963  |      150      150  |    96.67  |    46409   178768  |     0.32     0.08  | sca_101_unmapped   scaffold_84

As you can see from the table one of my scaffold in reference genome is matching with many scaffolds in the query genome. Another problem I have is the higher number of scaffolds in both of my reference and query genome. I am having trouble on how to further filter my result and separate the core and accessory region in my query genome. I am stuck in this step from quite some time and I could not find any resource which will tell me what to do. I will really appreciate for any suggestions.

Thank you, Ambika

MUMmer Nucmer genome alignment • 1.1k views

ADD COMMENT • link 4.0 years ago by AP ▴ 80

1

Entering edit mode

Not directly answering your question, but I suggest trying Anvi'o. I don't use the program, but I know someone who dealt with the same issue as you and Anvi'o was his solution. It has a nice graphical interface and many tutorials are available.

If this is a complete table, there isn't much overlap between the two genomes. Aside from a single ~4kb match, everything else is below 2kb. It may be more informative if you translate/annotate the genomes, and compare at protein level.

ADD REPLY • link 4.0 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

It is not a complete table I have some overlap regions more than 20kb as well. I do have the gene annotation file. Do you think comparing the protein sequences will help in distinguishing the accessory region of the genome. And thank you for suggesting that program I will look into that.

ADD REPLY • link 4.0 years ago by AP ▴ 80