I have two WGS fasta files generated from PacBio sequencing of a metagnome sample: A.fasta and B.fasta.
When aligned with Mauve, I can see that A.fasta is missing an A nucleotide at the end of the contig in the assembled sequence, which is present in B.fasta. Due to the assemblies, the nucleotide in B.fasta is in the middle of the contig. This is the only difference between the two sequences according to Mauve.
I want to use nucmer/mummer (version 4.0.0rc1) to compare these two files programmatically, since we have over 50 to compare and don't want to do it by hand with Mauve. To do this I run nucmer
on these two fasta files and then show-snps
:
nucmer -p AvsB A.fasta B.fasta
show-snps -Clr AvsB.delta
However, this shows no snps present between A.fasta and B.fasta. I can run show-snps
without the C
flag, but then it shows way too many unaligned regions and is not useful. I've also tried switching the order of the fasta files and there is no change in the output.
I've also ran the dnadiff
command on these two files to see the report:
dnadiff A.fasta B.fasta
When I look in the report produced I find the following differences:
[REF] [QRY]
[Bases]
TotalBases 2385521 2385522
AlignedBases 2385521(100.0000%) 2385521(100.0000%)
UnalignedBases 0(0.0000%) 1(0.0000%)
[Alignments]
1-to-1 2 2
TotalLength 2385521 2385521
AvgLength 1192760.5000 1192760.5000
AvgIdentity 100.0000 100.0000
M-to-M 2 2
TotalLength 2385521 2385521
AvgLength 1192760.5000 1192760.5000
AvgIdentity 100.0000 100.0000
[Feature Estimates]
Breakpoints 2 2
Relocations 1 1
Translocations 0 0
Inversions 0 0
Insertions 0 1
InsertionSum 0 1
InsertionAvg 0.0000 1.0000
There is nothing listed for INDELS below this either.
I've gathered that nucmer is not aligning a base from A.fasta compared to B.fasta, which I expect, but I don't know how to get the information as to what that base is. My ultimate goal is to use corresponding Illumina data to determine the "true" sequence when the two PacBio assemblies disagree. I've successfully used nucmer/mummer for this for another MAG from the metagenome, but there the indel was in the middle of both contigs so it was reported as expected.
Any insights on nucmer/mummer or ways to solve this problem with other programs would be greatly appreciated! I'm not sure what else to try with nucmer/mummer.