Question

Which pangenome graph (full/clip/filter/sampled) to use when align RNA-seq with mpmap

0

Entering edit mode

11 months ago

Hanc ▴ 10

Hi,

I have a Minigraph-cactus pangenome graph and would like to used mpmap to align RNA-seq data.

According to the transcriptomics analysis wiki, I should construct a spliced pangenome graph using:

vg rna -p --threads <threads> --transcripts annotation.[gtf|gff3] --use-hap-ref --gbz-format graph.gbz > spliced_graph.pg

However, for the input graph.gbz file, I am wondering which graph I should use for RNA-seq alignment:

full.gbz: the full graph without clipping and filtering
clip.gbz: clipped graph
filter.gbz: clipped and filtered by haplotype frequency
sample.gbz: haplotype sampled graph using the WGS data of the same sample of the RNA-seq data

When using giraffe to align WGS data, I learnt that the performance is sample > filter > clip when there are many haplotypes. Is it similar for aligning RNA-seq data? Can I just use the clip.gbz (or even full.gbz)?

Many thanks,

Han

vg • 1.3k views

ADD COMMENT • link 11 months ago by Hanc ▴ 10

score 1 · Answer 1 · 2024-12-19

1

Entering edit mode

11 months ago

Jordan M Eizenga ▴ 760

The clipped graph tends to perform better than the full or filter graph. The sampled graph probably would work better than the clipped graph, but the pipeline required might not be very practical. vg mpmap uses the GCSA2 index, which has a much longer construction time than vg giraffe's indexes, so having to reconstruct it once per sample is much higher overhead.

In case you'd like to re-use them, the spliced pangenome graph and indexes that were used for the RNA-seq analysis in the 2023 HPRC paper are available here: https://cgl.gi.ucsc.edu/data/vgrna/hprc_analyses/graphs/