hi,
I'm conducted an assembly of a genome and during the testing phase (different inputs, K-mer and such) I made an observation that worries me a little. here is the situation: I have 2 assembly results using the same paired and single end read input data set but in one of them I added additional mate-pair data to get (better) scaffolding. According to the progress log and such both runs finished without any issues. When I now compare the results, the stats, I noticed that the assembly of the run without the mate-pair gives seriously better stats??
run1 (without mate pair):
n n:500 L50 min N80 N50 N20 E-size max sum name
75.35e6 328297 142964 500 525 580 698 632 3857 198.7e6 Test-unitigs.fa
75.33e6 330804 143107 500 525 582 709 649 9353 202.1e6 Test-contigs.fa
75.33e6 330732 141848 500 525 583 711 657 19614 202.5e6 Test-scaffolds.fa
run2 (with mate-pair info)
n n:500 L50 min N80 N50 N20 E-size max sum name
75.35e6 328297 142964 500 525 580 698 632 3857 198.7e6 Test-unitigs.fa
75.33e6 330804 143107 500 525 582 709 649 9353 202.1e6 Test-contigs.fa
75.33e6 330804 143107 500 525 582 709 649 9353 202.1e6 Test-scaffolds.fa
From what I can see it looks like in the second run he did not even do any scaffolding? The mate-pairs I'm using are derived from 454 data. To some extent I understand that the 454 mate pairs are not adding much additional info (but looking at the alignment results for those libraries it should have added info), but what I do not understand is that I don't get at least what I got from solely using the paired-end data.
Is it possible that ABySS when mate-pair data is provided does not do any scaffolding with the paired-ends all together (and thus only uses the mate-pair info)?
Yes, that is possible. Please report the exact command line that you used for both assemblies.
Here is the one for the run with the MP data:
and here is the one without MP data:
I mainly noticed the difference because for the one with the MP data I don't see any alignments of the PE files against the intermediate assembly only abyss-map output for the MP files.