Error with ALE (Assessing the Accuracy of Genome and Metagenome Assemblies)
1
0
Entering edit mode
6.7 years ago
alecloic ▴ 40

hello,

I am currently working on a de novo large genome assembly. now I want to assess the quality of my reconstructed genome. I saw that there are several suitable programs. I am trying to use ALE a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies. I pre-aligned the reads on scaffold with Bowtie2 and bwa. My reads are in the format SAM.

In the 2 cases, with this command I have an error:

./ALE /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS-scaffolds_bowtie2.sam /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Assembly/ABySS_Assembly/ABYSS-scaffolds.fa /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS_scaffolds_ALE.txt

[bam_header_read] EOF marker is absent. The input is probably truncated.
Checking if /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS-scaffolds_bowtie2_step2.sam is a SAM formatted file, instead of BAM
[samopen] SAM header is present: 3320 sequences.
Found 6 ambiguous bases (excluding N) in the assembly.
Reading in the map and computing statistics...
Insert length and std not given, will be calculated from input map.
Setting library to be sorted by name (647052 new sequential names vs 1294104 reads)
Found FR sample avg insert length to be 173.244532 from 887964 mapped reads
Found FR sample insert length std to be 19.926324
There were 1294104 total reads, 1294104 paired (923164 properly mated), 41431 proper singles, 329509 improper reads (3592 chimeric). (324647 reads were unmapped)
Saved library parameters to /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS_scaffolds_ALE.txt.param
Checking if /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS-scaffolds_bowtie2_step2.sam is a SAM formatted file, instead of BAM
[samopen] SAM header is present: 3320 sequences.
MD mismatch but it does not match! SRR022868.11196 89: refpos 442 MDpos 14: 'K' vs 'N'
Abandon

I do not know if anyone has an idea of the origin of the problem and its solution.

In seeking I saw that some people had a similar problem with sametools. it would seem that it is a problem in the file sam. But I do not see why and how the SAM file may not be correct.

cordially

Assembly evaluation next-gen ALE alignment • 2.1k views
2
Entering edit mode
6.6 years ago
rsegan ▴ 20

So this problem is likely because the MD field of the samfile produced by Bowtie2 and/or BWA is slightly incorrect.  The MD field says that the mismatch in the read should be 'N' whereas the reference at that position is actually the ambiguous base 'K'.  ALE is very sensitive to improper or inconsistent SAM/BAM files so it aborted after detecting this problem.

As this difference is not really an issue in evaluating the assembly, I pushed a change that changes this error message to just a warning and let ALE continue to process the file.

Thanks,

Rob Egan

0
Entering edit mode

A. GUYOMARD