Question: Velvet Assembly Quality
4
gravatar for Lee Katz
3.5 years ago by
Lee Katz2.4k
Atlanta, GA
Lee Katz2.4k wrote:

I would like to make a fake quality file for a Velvet de novo assembly. What is the average quality of a velvet assembly?

I've been told that Velvet assemblies are basically infallible. If they are, then for an assembly I might say that there is less than one error in the entire genome,

phred=-10*log(1/G)

Where G is the size of the genome in nucleotides. If the size of my bacterial genome is 2 MB, then the quality of the assembly would be greater than -10log(1/(210^6)), or 63.

So, does anyone have a good idea what a Velvet quality file would look like? Do you agree with my "63" assessment?

ADD COMMENTlink modified 3.3 years ago by Falcatrua10 • written 3.5 years ago by Lee Katz2.4k

It sounds like a strange claim to me that Velvet assemblies are "basically infallible". What is this statement based on?

ADD REPLYlink written 3.5 years ago by Mikael Huss3.4k

A colleague told me that he resequenced a genome using Illumina. He found 3 errors that even a Sanger whole genome sequence project missed, which he verified by looking at original chromatograms.

Also, in the Velvet website (http://www.ebi.ac.uk/~zerbino/velvet/):

Does Velvet take base caller confidence scores into account?

No it currently does not, although it would be easy to implement. The reason we have not done it yet is because a lot can be inferred from coverage alone.

ADD REPLYlink written 3.5 years ago by Lee Katz2.4k

I'm spawning a new question out of this instead. http://biostar.stackexchange.com/questions/3117/velvet-retain-read-names-in-afg-file

ADD REPLYlink written 3.5 years ago by Lee Katz2.4k
3
gravatar for Ketil
3.5 years ago by
Ketil3.3k
Ketil3.3k wrote:

Phred qualities are defined as:

E = 10^{-Q/10}

where E is the error rate (probailty of a correct call), and Q is phred quality. So, Q=10 means E=0.1, Q=20 gives E=0.01 (1% errors). Conversely

Q = -10*log_10(E)

Which is the formula you used, so your analysis and arithmetic looks good to me.

That aside, I'm not sure why you want to do this. The point of having quality associated with an assembly is to identify the variations, some areas are bound to have lower coverage or an accumulation of poor quality reads - which the quality score can help you to identify.

ADD COMMENTlink written 3.5 years ago by Ketil3.3k
1

My problem with Velvet is that it does not output quality scores. It might be a significant project to produce accurate cumulative quality scores in the final assembly. However, a fake quality file such as the one above would be all right. Or, if I could produce a quality file based on coverage it might be even better.

The biggest reason I want this quality file is so that I can merge its assembly more accurately with another assembly to produce a comprehensive final assembly.

ADD REPLYlink written 3.5 years ago by Lee Katz2.4k

So you would say to look at the coverage and create a quality score based on how many reads cover a base? Has this been done before with Velvet output?

ADD REPLYlink written 3.5 years ago by Lee Katz2.4k

I'm not familiar with Velvet, but most sequencing technologies produce reads with associated quality of some sort. The assembler should use this quality to derive a new quality for each position, so that if the bases of all (or most) reads agree, this should result in a higher quality base call in the assembly, than if more reads disagree.

ADD REPLYlink written 3.5 years ago by Ketil3.3k
1
gravatar for Lee Katz
3.5 years ago by
Lee Katz2.4k
Atlanta, GA
Lee Katz2.4k wrote:

It looks like the quality could actually be in the afg file of the Velvet output. Can anyone confirm? I had to use the latest version of BioPerl to get some of the methods.

#!/usr/bin/perl
use Bio::Assembly::IO;
use Data::Dumper;
system("amos2ace velvet_asm.afg")
$aceio=Bio::Assembly::IO->new(-file=>"velvet_asm.ace",-format=>"ace");
while(my $contig=$aceio->next_contig){
  if(ref($contig) eq 'Bio::Assembly::Contig'){
    print "contig: ";
    print Dumper $contig->get_consensus_quality;
    print "\n";
  }
  elsif(ref($contig) eq 'Bio::Assembly::Singlet'){
    print "singlet: ";
    print Dumper $contig->get_consensus_quality;
    print "\n";
  }
}
ADD COMMENTlink written 3.5 years ago by Lee Katz2.4k

This script of course has to read the afg file that is outputted by velvetg using the "-amos_file yes" option.

ADD REPLYlink written 3.5 years ago by Lee Katz2.4k

I'm spawning a new question out of this instead. http://biostar.stackexchange.com/questions/3117/velvet-retain-read-names-in-afg-file

ADD REPLYlink written 3.5 years ago by Lee Katz2.4k

I have difficulties making sense out of this in conjunction with Falcatrua's answer. Are these real qualities, or not? Anyway, I just recalculate qualities by independently mapping reads back to the assembly.

ADD REPLYlink written 3.1 years ago by Ketil3.3k
1
gravatar for Ketil
3.5 years ago by
Ketil3.3k
Ketil3.3k wrote:

Just noticed that DiGuistini et al. used Velvet to pre-assemble Illumina reads (as Forge apparently couldn't handle the size of the dataset), and they assigned a flat Phred 20 quality to the assemblies in further processing (midway down in the left column, second to last page of the paper)

ADD COMMENTlink written 3.5 years ago by Ketil3.3k
1
gravatar for Falcatrua
3.5 years ago by
Falcatrua10
Falcatrua10 wrote:

Basicaly, what Velvet does with quality values when converting .afg to .ace (amos2ace) for files without phred qualities is convert the sequence (bases) to ASCII characters.

You can check this by looking in a .ace file created by amos2ace for assembly files without phred values. Get a contig sequence in the ace and look at it`s quality values, same bases always have the same values.

Velvet does this thing because when there are no quality values in your reads submitted to assembly, the quality values in the .afg file is the sequence itself (.seq and .qlt are redundant) You can also check amos2ace code (in perl), and you will see that it gets this .qlt values (same thing as the sequence) and convert the sequence bases to ascII characters using ord (), that is a perl function to do that conversion.

ADD COMMENTlink written 3.5 years ago by Falcatrua10

Let me get this straight: it assigns a fixed quality value to each nucleotide - so that e.g. all 'A's in the sequence will have the same quality, and all 'C's will have the same value, but different from 'A'? I hope I'm misunderstanding you here.

ADD REPLYlink written 3.1 years ago by Ketil3.3k
Please log in to add an answer.

Help
Access
  • RSS
  • Stats
  • API

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.0.0
Traffic: 379 users visited in the last hour