[VG sim, surject, giraffe] How to remove a sample from GFA/GBZ graph
1
0
Entering edit mode
7 weeks ago
saruman • 0

Hello VG team,

I want to compare alignment performances between Bowtie2, BWA-MEM2, and VG Giraffe. To do so, I have generated simulated reads (50 paired-end FASTQ files) based on chromosome 1 (HPRC v1.1) using vg sim and vg surject as follows:

vg sim \
    -x chr1.xg \
    -g chr1.gbwt \
    -n 20000000 \
    -p 500 -v 50 \
    -e 0.0024 -i 0.00029 \
    --threads 40 \
    --progress \
    -m HG02055 \
    -F REAL_SAMPLE_1.fastq.gz \
    -F REAL_SAMPLE_2.fastq.gz \
    --random-seed 42
    -a > HG02055.gam

vg surject \
    -x chr1.xg \
    --bam-output \
    --into-path CHM13#0#chr1 \
    --threads 40 \
    --progress \
    --sample HG02055 \
    HG02055.gam \
    | samtools reheader -c "sed s/CHM13#0#//g" - \
    | samtools sort -n -@ 40 - \
    | samtools fastq -@ 40 - > HG02055.interleaved.fastq

Testing the Bowtie2 and BWA-MEM2 is straightforward. However, when it comes to VG Giraffe, things get a bit complex. For instance, if I want to test sample HG02055, I should first generate a custom chr1 pangenome without paths associated with HG02055, as it would be similar to a personalized pangenome.

What is the best way to proceed from a computational perspective? I am aware I can remove a sample from a GWBT using the following command:

vg gbwt -o chr1.HG02055.gbwt --remove-sample HG02055 chr1.gbwt

However, how can I remove it from a GFA or GBZ graph?

Thank you in advance for any help you can provide.

vg • 662 views
ADD COMMENT
2
Entering edit mode
7 weeks ago
Jouni Sirén ▴ 760

You can remove a sample from a GBZ graph in two steps. First you extract the GBWT index from the GBZ file:

vg gbwt -o graph.gbwt -Z graph.gbz

Then you remove the sample and create a new GBZ graph, using the original GBZ as the input graph:

vg gbwt --remove-sample SAMPLE -g no-SAMPLE.gbz --gbz-format -x graph.gbz graph.gbwt

These two steps cannot be combined in a single command. Option --remove-sample modifies the input GBWT, which would break the GBZ graph based on it.

ADD COMMENT
0
Entering edit mode

Thank you so much. I did not fully grasp the vg gbwt command.

ADD REPLY
0
Entering edit mode

And thank you for the great work!

ADD REPLY
0
Entering edit mode

saruman : Please accept this answer (green check mark) to provide closure for this thread.

ADD REPLY
0
Entering edit mode

Done. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 2513 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6