Which pangenome graph (full/clip/filter/sampled) to use when align RNA-seq with mpmap
1
0
Entering edit mode
9 months ago
Hanc ▴ 10

Hi,

I have a Minigraph-cactus pangenome graph and would like to used mpmap to align RNA-seq data.

According to the transcriptomics analysis wiki, I should construct a spliced pangenome graph using:

vg rna -p --threads <threads> --transcripts annotation.[gtf|gff3] --use-hap-ref --gbz-format graph.gbz > spliced_graph.pg

However, for the input graph.gbz file, I am wondering which graph I should use for RNA-seq alignment:

  • full.gbz: the full graph without clipping and filtering
  • clip.gbz: clipped graph
  • filter.gbz: clipped and filtered by haplotype frequency
  • sample.gbz: haplotype sampled graph using the WGS data of the same sample of the RNA-seq data

When using giraffe to align WGS data, I learnt that the performance is sample > filter > clip when there are many haplotypes. Is it similar for aligning RNA-seq data? Can I just use the clip.gbz (or even full.gbz)?

Many thanks,

Han

vg • 1.1k views
ADD COMMENT
1
Entering edit mode
9 months ago

The clipped graph tends to perform better than the full or filter graph. The sampled graph probably would work better than the clipped graph, but the pipeline required might not be very practical. vg mpmap uses the GCSA2 index, which has a much longer construction time than vg giraffe's indexes, so having to reconstruct it once per sample is much higher overhead.

In case you'd like to re-use them, the spliced pangenome graph and indexes that were used for the RNA-seq analysis in the 2023 HPRC paper are available here: https://cgl.gi.ucsc.edu/data/vgrna/hprc_analyses/graphs/

ADD COMMENT
0
Entering edit mode

Thank you so much! I will use the clipped graph.

ADD REPLY

Login before adding your answer.

Traffic: 4008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6