Question: Error with ALE (Assessing the Accuracy of Genome and Metagenome Assemblies)
0
gravatar for alecloic
3.9 years ago by
alecloic40
France
alecloic40 wrote:

hello,

I am currently working on a de novo large genome assembly. now I want to assess the quality of my reconstructed genome. I saw that there are several suitable programs. I am trying to use ALE a Generic Assembly Likelihood Evaluation Framework for Assessing the Accuracy of Genome and Metagenome Assemblies. I pre-aligned the reads on scaffold with Bowtie2 and bwa. My reads are in the format SAM.

In the 2 cases, with this command I have an error:

./ALE /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS-scaffolds_bowtie2.sam /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Assembly/ABySS_Assembly/ABYSS-scaffolds.fa /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS_scaffolds_ALE.txt

 

[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Checking if /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS-scaffolds_bowtie2_step2.sam is a SAM formatted file, instead of BAM
[samopen] SAM header is present: 3320 sequences.
Reading in assembly...
Found 6 ambiguous bases (excluding N) in the assembly.
Reading in the map and computing statistics...
Insert length and std not given, will be calculated from input map.
Read 1000000 reads...
Setting library to be sorted by name (647052 new sequential names vs 1294104 reads)
Found FR sample avg insert length to be 173.244532 from 887964 mapped reads
Found FR sample insert length std to be 19.926324
There were 1294104 total reads, 1294104 paired (923164 properly mated), 41431 proper singles, 329509 improper reads (3592 chimeric). (324647 reads were unmapped)
Saved library parameters to /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS_scaffolds_ALE.txt.param
[bam_header_read] EOF marker is absent. The input is probably truncated.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
Checking if /data/DataSet/DeNovo/Softs/pipeline/temp/donneestest/Validation/ABYSS-scaffolds_bowtie2_step2.sam is a SAM formatted file, instead of BAM
[samopen] SAM header is present: 3320 sequences.
Computing read placements and depths
MD mismatch but it does not match! SRR022868.11196 89: refpos 442 MDpos 14: 'K' vs 'N'
Abandon

I do not know if anyone has an idea of the origin of the problem and its solution.

In seeking I saw that some people had a similar problem with sametools. it would seem that it is a problem in the file sam. But I do not see why and how the SAM file may not be correct.

cordially

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by alecloic40
2
gravatar for rsegan
3.9 years ago by
rsegan20
United States
rsegan20 wrote:

So this problem is likely because the MD field of the samfile produced by Bowtie2 and/or BWA is slightly incorrect.  The MD field says that the mismatch in the read should be 'N' whereas the reference at that position is actually the ambiguous base 'K'.  ALE is very sensitive to improper or inconsistent SAM/BAM files so it aborted after detecting this problem.

As this difference is not really an issue in evaluating the assembly, I pushed a change that changes this error message to just a warning and let ALE continue to process the file.

Thanks,

Rob Egan

 

ADD COMMENTlink written 3.9 years ago by rsegan20

Your solution works, thank you!

A. GUYOMARD

ADD REPLYlink written 3.9 years ago by alecloic40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1093 users visited in the last hour