I would like to make a fake quality file for a Velvet de novo assembly. What is the average quality of a velvet assembly?
I've been told that Velvet assemblies are basically infallible. If they are, then for an assembly I might say that there is less than one error in the entire genome,
phred=-10*log(1/G)
Where G is the size of the genome in nucleotides. If the size of my bacterial genome is 2 MB, then the quality of the assembly would be greater than -10log(1/(210^6)), or 63.
So, does anyone have a good idea what a Velvet quality file would look like? Do you agree with my "63" assessment?
It sounds like a strange claim to me that Velvet assemblies are "basically infallible". What is this statement based on?
A colleague told me that he resequenced a genome using Illumina. He found 3 errors that even a Sanger whole genome sequence project missed, which he verified by looking at original chromatograms.
Also, in the Velvet website (http://www.ebi.ac.uk/~zerbino/velvet/):
Does Velvet take base caller confidence scores into account?
No it currently does not, although it would be easy to implement. The reason we have not done it yet is because a lot can be inferred from coverage alone.
I'm spawning a new question out of this instead. http://biostar.stackexchange.com/questions/3117/velvet-retain-read-names-in-afg-file