Finding indel at end of contig with nucmer
0
0
Entering edit mode
2.2 years ago
kmyers2 ▴ 80

I have two WGS fasta files generated from PacBio sequencing of a metagnome sample: A.fasta and B.fasta.

When aligned with Mauve, I can see that A.fasta is missing an A nucleotide at the end of the contig in the assembled sequence, which is present in B.fasta. Due to the assemblies, the nucleotide in B.fasta is in the middle of the contig. This is the only difference between the two sequences according to Mauve.

I want to use nucmer/mummer (version 4.0.0rc1) to compare these two files programmatically, since we have over 50 to compare and don't want to do it by hand with Mauve. To do this I run nucmer on these two fasta files and then show-snps:

nucmer -p AvsB A.fasta B.fasta
show-snps -Clr AvsB.delta

However, this shows no snps present between A.fasta and B.fasta. I can run show-snps without the C flag, but then it shows way too many unaligned regions and is not useful. I've also tried switching the order of the fasta files and there is no change in the output.

I've also ran the dnadiff command on these two files to see the report:

dnadiff A.fasta B.fasta

When I look in the report produced I find the following differences:

                           [REF]                  [QRY]
[Bases]
TotalBases                   2385521              2385522
AlignedBases       2385521(100.0000%)    2385521(100.0000%)
UnalignedBases            0(0.0000%)           1(0.0000%)

[Alignments]
1-to-1                             2                    2
TotalLength                  2385521              2385521
AvgLength               1192760.5000         1192760.5000
AvgIdentity                 100.0000             100.0000

M-to-M                             2                    2
TotalLength                  2385521              2385521
AvgLength               1192760.5000         1192760.5000
AvgIdentity                 100.0000             100.0000

[Feature Estimates]
Breakpoints                        2                    2
Relocations                        1                    1
Translocations                     0                    0
Inversions                         0                    0

Insertions                         0                    1
InsertionSum                       0                    1
InsertionAvg                  0.0000               1.0000

There is nothing listed for INDELS below this either.

I've gathered that nucmer is not aligning a base from A.fasta compared to B.fasta, which I expect, but I don't know how to get the information as to what that base is. My ultimate goal is to use corresponding Illumina data to determine the "true" sequence when the two PacBio assemblies disagree. I've successfully used nucmer/mummer for this for another MAG from the metagenome, but there the indel was in the middle of both contigs so it was reported as expected.

Any insights on nucmer/mummer or ways to solve this problem with other programs would be greatly appreciated! I'm not sure what else to try with nucmer/mummer.

mummer snps indel nucmer • 427 views
ADD COMMENT

Login before adding your answer.

Traffic: 1640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6