Entering edit mode
20 months ago
Daniel
•
0
Hi, I'm aligning my reads to the reference graph using the same process as found here:
time ${DATA_DIR}/vg giraffe --progress \
--read-group "ID:1 LB:lib1 SM:HG003 PL:illumina PU:unit1" \
--sample "HG003" \
-o BAM --ref-paths ${DATA_DIR}/GRCh38.path_list.txt \
-P -L 3000 \
-f ${DATA_DIR}/HG003.novaseq.pcr-free.35x.R1.fastq.gz \
-f ${DATA_DIR}/HG003.novaseq.pcr-free.35x.R2.fastq.gz \
-Z ${DATA_DIR}/hprc-v1.1-mc-grch38.gbz \
--kff-name ${DATA_DIR}/HG003.fq.kff \
--haplotype-name ${DATA_DIR}/hprc-v1.1-mc-grch38.hapl \
-t $(nproc) > reads.unsorted.bam
Here are my questions:
- Is there any difference / benefit in merging multiple graphs from multiple read pairs vs doing a "samtools merge bam" to multiple result BAMs (after ordering alignments)? If so, what is command for merging graphs?
- Can you clarify whether we should be removing duplicates? At which point should this occur? Which tool is recommended - Picard MarkDuplicates?
Thanks you.