Question

Error with ALE (Assessing the Accuracy of Genome and Metagenome Assemblies)

0

Entering edit mode

10.3 years ago

alecloic ▴ 40

Hello,

I am currently working on a de novo large genome assembly. now I want to assess the quality of my reconstructed genome. I saw that there are several suitable programs. I am trying to use ALE a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies. I pre-aligned the reads on scaffold with Bowtie2 and bwa. My reads are in the format SAM.

In the 2 cases, with this command I have an error:

./ALE \
  /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS-scaffolds_bowtie2.sam \
  /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Assembly/ABySS_Assembly/ABYSS-scaffolds.fa \
  /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS_scaffolds_ALE.txt

[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Checking if /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS-scaffolds_bowtie2_step2.sam is a SAM formatted file, instead of BAM
[samopen] SAM header is present: 3320 sequences.
Reading in assembly...
Found 6 ambiguous bases (excluding N) in the assembly.
Reading in the map and computing statistics...
Insert length and std not given, will be calculated from input map.
Read 1000000 reads...
Setting library to be sorted by name (647052 new sequential names vs 1294104 reads)
Found FR sample avg insert length to be 173.244532 from 887964 mapped reads
Found FR sample insert length std to be 19.926324
There were 1294104 total reads, 1294104 paired (923164 properly mated), 41431 proper singles, 329509 improper reads (3592 chimeric). (324647 reads were unmapped)
Saved library parameters to /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS_scaffolds_ALE.txt.param
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Checking if /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS-scaffolds_bowtie2_step2.sam is a SAM formatted file, instead of BAM
[samopen] SAM header is present: 3320 sequences.
Computing read placements and depths
MD mismatch but it does not match! SRR022868.11196 89: refpos 442 MDpos 14: 'K' vs 'N'
Abandon

I do not know if anyone has an idea of the origin of the problem and its solution.

In seeking I saw that some people had a similar problem with samtools. it would seem that it is a problem in the file sam. But I do not see why and how the SAM file may not be correct.

cordially

next-gen Assembly ALE evaluation alignment • 3.0k views

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.3 years ago by alecloic ▴ 40

Ram · Accepted Answer · 2015-04-16

3

Entering edit mode

10.2 years ago

rsegan ▴ 30

So this problem is likely because the MD field of the samfile produced by Bowtie2 and/or BWA is slightly incorrect. The MD field says that the mismatch in the read should be 'N' whereas the reference at that position is actually the ambiguous base 'K'. ALE is very sensitive to improper or inconsistent SAM/BAM files so it aborted after detecting this problem.

As this difference is not really an issue in evaluating the assembly, I pushed a change that changes this error message to just a warning and let ALE continue to process the file.

Thanks,
Rob Egan

ADD COMMENT • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by rsegan ▴ 30

0

Entering edit mode

Your solution works, thank you!

A. GUYOMARD

ADD REPLY • link updated 3.0 years ago by Ram 45k • written 10.2 years ago by alecloic ▴ 40