Entering edit mode
6 days ago
xinguok794
▴
10
Hi, I input the genome purged.fa (QV=52) assembled by hifiasm and purged by haplotig into YAHS, and the resulting scaffolds_final.fa was directly reduced to 23 after passing through Merqury, I want to know what caused this? Below is my code:
#!/bin/bash
set -e
# --- Set variables (please check paths and filenames) ---
ASSEMBLY_FA="purged.fasta"
HIC_R1=/users/jieqyan/workspace/assembly/ref/Hi-C/AT_HiC_R1.clean.fastq.gz
HIC_R2=/users/jieqyan/workspace/assembly/ref/Hi-C/AT_HiC_R2.clean.fastq.gz
# Note: PICARD_JAR variable removed, using 'picard' as the executable command
THREADS=20
echo "--- Stage 1: Hi-C preprocessing begins ---"
# 1. Index the cleaned contigs
echo "1. Indexing the cleaned contigs..."
bwa index $ASSEMBLY_FA
samtools faidx $ASSEMBLY_FA
# 2. Align Hi-C reads and generate sorted BAM
echo "2. Aligning Hi-C reads and generating sorted BAM..."
bwa mem -5SP -t $THREADS $ASSEMBLY_FA $HIC_R1 $HIC_R2 | \
samtools view -@ 4 -bS - | \
samtools sort -@ 4 -o hic.sorted.bam
# 3. Mark and remove PCR duplicates
echo "3. Picard MarkDuplicates removing duplicates..."
# Ensure -Xmx comes before, using 'picard' command
picard -Xmx32g MarkDuplicates \
I=hic.sorted.bam \
O=hic_dedup.bam \
M=metrics.txt \
REMOVE_DUPLICATES=true
# 4. Filter and sort by read name
echo "4. Filtering non-unique mappings and sorting by name..."
samtools view -F 0x904 -h -b hic_dedup.bam | \
samtools sort -@ 4 -n -o hic_clean.namesort.bam
echo "--- Stage 1 completed. Output: hic_clean.namesort.bam ---"
yahs purged.fasta hic_clean.namesort.bam -o yahs_scaffolding_output
I would be very grateful if you could offer me any advice!
Looks like YAHS does an initial error correction step which you can skip with (--no-contig-ec)
Can you see if skipping the correction helps with the merqury quality?
Thank you for your reply! Unfortunately, despite running YAHS with
--no-contig-ec, the QV value of the resulting FASTA file did not improve further; in fact, it decreased further (QV: 19). Besides QV, I also evaluated LAI and busco. The results showed that both of these metrics were quite good (LAI=25, busco=100), but the QV value remained low. Strangely, after polishing the FASTA file again, the QV value rebounded to 52, but the fragmentation of the FASTA became even higher. I suspect that the significant decrease in QV value after HiC scaffolding might be an illusion, perhaps YAHS performed some kind of special masking process that affected Merqury's judgment?