Merqury QV score evaluation for HiFi assemblies
1
1
Entering edit mode
19 days ago
SomeOne ▴ 260

Hello,

I have some genome of fungi assembled using HiFi reads. Though the genomes look really good (N50 > 4mb and BUSCO > 99.5% completness) i also wanted to have a look at the QV scores for the assemblies.

For this i tried running Inspector tool which calculates QV based on structural error and small scale base substitutions etc and the QV score were fairly high (QV > 65) but for one sample the score was 49.51

Keeping in mind that for all samples (11) i used same pipeline and same flags in all tools so far.

my pipeline structure was something like this

HiFi-bam > convert to fastq/a > hifiasm > purge-dups > mito-remove > coverage QC and high/low coverage contigs removal > polishing (nextPOLISH2 using hifi and Illumina reads) > QC with Inspector

The Inspector stats look like this enter image description here

I am not sure why there are still some small scale assembly error specifically for sample PB02 even after 1 round of polishing.

I also tried to calculate QV sroce using mryl+merqury-v1.3 using commands

## 1. Get the right k size
genome_size=$(awk '/^>/ {next} {n+=length} END{print n}' "$current_asmFASTA")
K=$(best_k.sh $genome_size | tail -n1 | awk '{print int($1+0.9999)}')

## 2. Build k-mer dbs with meryl
meryl count k=$K threads=$threads output $merquryOUT/${sample}_reads.meryl $readsFASTQ

## 3. Run Merqury to get QV and spectra
cd $merquryOUT
merqury.sh ${sample}_reads.meryl ${sample}.fasta ${sample}_merqury

and got following results in <sample>_merqury.qv files enter image description here

from merqury wiki (https://github.com/marbl/merqury/wiki/2.-Overall-k-mer-evaluation) each column means this

  1. Assembly of interest. Both is the combination of the above two.
  2. Total (present) k-mers uniquely found only in the assembly
  3. Total (present) k-mers found in the assembly
  4. QV
  5. Error rate

QUESTIONS

  1. why do inspector results show substitution errors even after polishing.
  2. am i using the merqury the right way ? if not can you recomend any tutorial or corrections.

Thank you

inspecot HiFi QV Merqury genome • 610 views
ADD COMMENT
0
Entering edit mode

I will assume that the HiFi sequencing, for all strains, all came from the same batch and that they have similar depth and read length etc

I would say the most likely reason therefore is that there is something particular about the strain. Especially considering the merqury QV also confirms that there are more 'errors' in this strain.

My first guess would be that there is some sort of SV such as a large duplication/aneuploidy with mild heterozygosity that is collapsed in the assembly. For this I would check the coverage across each of the contigs in the assembly etc.

Alternatively, you could check if the polishing if actually working within all contigs.

ADD REPLY
0
Entering edit mode
19 days ago
SomeOne ▴ 260

Just Adding some more information, May be this is also the answer to my question too.

As my steps are

HiFi-bam > convert to fastq/a > hifiasm > purge-dups > mito-remove > coverage QC and high/low coverage contigs removal > polishing (nextPOLISH2 using hifi and Illumina reads) > QC with Inspector

I tried runnig INSPECTOR after coverage QC without running NextPolish and the QV score for sample PB02 was > 71. So my guess is that somwhoe NextPolish2 was introducing more small-scale substitutions in my fasta files. (trying to understand this, how it can happen)

ADD COMMENT

Login before adding your answer.

Traffic: 3254 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6