Question: How should I interpret this Mummer output files?
2
gravatar for madkitty
4.6 years ago by
madkitty580
Canada
madkitty580 wrote:

I have this output file from mummer, which shows mulpitle 100% Identity Ref/Query. My reference is the gene DNMT1 (specie A)  and my query is a bunch of scaffold for a specie that hasn't been mapped yet to the chromosome level (closely related specie B). Other genes showed only one scaffold (query) having 100% identity and multiple alignment showed that indeed only a few SNPs are present in the whole gene. Though in certain genes, mummer returns a bunch 100% identity scaffold which do not align at all to the reference (or with millions of SNPs). 

I already thought of looking at scaffold having a maximum COV Q and COV R, but these parameters are variable with the size and does not tell me the amount of SNPs in the sequence. 

Is there a way for me to filter more effectively which scaffold is truly a match to my reference?


Here is a sample of what my show-coords output file looks like for 100% Identity:

[S1] [E1] [S2] [E2] [LEN 1] [LEN 2] [% IDY] [LEN R] [LEN Q] [COV R] [COV Q] [TAGS]  
13898 13988 57004 57094 91 91 100 35452 107079 0.26 0.08 gi|194246388:49748443-49783894 scaffold11691
13898 13964 949 1015 67 67 100 35452 36637 0.19 0.18 gi|194246388:49748443-49783894 scaffold15913
13898 13964 1040 974 67 67 100 35452 2729 0.19 2.46 gi|194246388:49748443-49783894 scaffold47627
13900 13983 78963 79046 84 84 100 35452 161922 0.24 0.05 gi|194246388:49748443-49783894 scaffold12557
13900 13989 11716 11627 90 90 100 35452 30624 0.25 0.29 gi|194246388:49748443-49783894 scaffold25396
13900 13970 1089 1019 71 71 100 35452 1502 0.2 4.73 gi|194246388:49748443-49783894 scaffold27987
19908 19994 10242 10328 87 87 100 35452 17701 0.25 0.49 gi|194246388:49748443-49783894 scaffold52071
19909 19994 278272 278357 86 86 100 35452 353920 0.24 0.02 gi|194246388:49748443-49783894 scaffold1991
19910 19996 19486 19400 87 87 100 35452 81941 0.25 0.11 gi|194246388:49748443-49783894 scaffold14370
19910 19994 1805 1889 85 85 100 35452 2036 0.24 4.17 gi|194246388:49748443-49783894 scaffold46791
19911 19986 510 585 76 76 100 35452 1364 0.21 5.57 gi|194246388:49748443-49783894 scaffold84138
19912 19997 9499 9414 86 86 100 35452 61074 0.24 0.14 gi|194246388:49748443-49783894 scaffold44157
19912 19997 8587 8502 86 86 100 35452 15813 0.24 0.54 gi|194246388:49748443-49783894 scaffold9318
19922 19998 939 863 77 77 100 35452 1465 0.22 5.26 gi|194246388:49748443-49783894 scaffold35518
19928 19999 1018 1089 72 72 100 35452 1502 0.2 4.79 gi|194246388:49748443-49783894 scaffold27987
19929 19995 23559 23493 67 67 100 35452 28327 0.19 0.24 gi|194246388:49748443-49783894 scaffold27519
19932 19997 344 409 66 66 100 35452 5264 0.19 1.25 gi|194246388:49748443-49783894 scaffold13914
19935 20000 974 1039 66 66 100 35452 2729 0.19 2.42 gi|194246388:49748443-49783894 scaffold47627

 


Here is the code I use to retrieve these information:

(Specie A: reference genome is known // Specie B: reference genome currently unknown, only scaffolds are available).

$ nucmer --prefix=ref_qry Gene_Specie_A.fasta Specie_B.fa

$ show-coords -rclT ref_qry.delta > ref_qry.coords


 

mapping show-coords mummer • 2.1k views
ADD COMMENTlink modified 4.5 years ago by Biostar ♦♦ 20 • written 4.6 years ago by madkitty580
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1116 users visited in the last hour