Why did the QV value decrease significantly after using YAHS for HiC scaffolding?
0
1
Entering edit mode
6 days ago
xinguok794 ▴ 10

Hi, I input the genome purged.fa (QV=52) assembled by hifiasm and purged by haplotig into YAHS, and the resulting scaffolds_final.fa was directly reduced to 23 after passing through Merqury, I want to know what caused this? Below is my code:

#!/bin/bash
set -e

# --- Set variables (please check paths and filenames) ---
ASSEMBLY_FA="purged.fasta"
HIC_R1=/users/jieqyan/workspace/assembly/ref/Hi-C/AT_HiC_R1.clean.fastq.gz
HIC_R2=/users/jieqyan/workspace/assembly/ref/Hi-C/AT_HiC_R2.clean.fastq.gz
# Note: PICARD_JAR variable removed, using 'picard' as the executable command
THREADS=20

echo "--- Stage 1: Hi-C preprocessing begins ---"

# 1. Index the cleaned contigs
echo "1. Indexing the cleaned contigs..."
bwa index $ASSEMBLY_FA
samtools faidx $ASSEMBLY_FA

# 2. Align Hi-C reads and generate sorted BAM
echo "2. Aligning Hi-C reads and generating sorted BAM..."
bwa mem -5SP -t $THREADS $ASSEMBLY_FA $HIC_R1 $HIC_R2 | \
samtools view -@ 4 -bS - | \
samtools sort -@ 4 -o hic.sorted.bam

# 3. Mark and remove PCR duplicates
echo "3. Picard MarkDuplicates removing duplicates..."
# Ensure -Xmx comes before, using 'picard' command
picard -Xmx32g MarkDuplicates \
    I=hic.sorted.bam \
    O=hic_dedup.bam \
    M=metrics.txt \
    REMOVE_DUPLICATES=true

# 4. Filter and sort by read name
echo "4. Filtering non-unique mappings and sorting by name..."
samtools view -F 0x904 -h -b hic_dedup.bam | \
samtools sort -@ 4 -n -o hic_clean.namesort.bam

echo "--- Stage 1 completed. Output: hic_clean.namesort.bam ---"
yahs purged.fasta hic_clean.namesort.bam -o yahs_scaffolding_output

I would be very grateful if you could offer me any advice!

Merqury scaffolding assembly HiC • 482 views
ADD COMMENT
0
Entering edit mode

Looks like YAHS does an initial error correction step which you can skip with (--no-contig-ec)

Can you see if skipping the correction helps with the merqury quality?

ADD REPLY
0
Entering edit mode

Thank you for your reply! Unfortunately, despite running YAHS with --no-contig-ec, the QV value of the resulting FASTA file did not improve further; in fact, it decreased further (QV: 19). Besides QV, I also evaluated LAI and busco. The results showed that both of these metrics were quite good (LAI=25, busco=100), but the QV value remained low. Strangely, after polishing the FASTA file again, the QV value rebounded to 52, but the fragmentation of the FASTA became even higher. I suspect that the significant decrease in QV value after HiC scaffolding might be an illusion, perhaps YAHS performed some kind of special masking process that affected Merqury's judgment?

ADD REPLY

Login before adding your answer.

Traffic: 3709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6