Question

Constructing a bread wheat pangenome graph using PGGB: per-chromosome vs. whole-genome approach and integration of assemblies from different sequencing technologies ?

0

Entering edit mode

9 hours ago

Sony Nguyen ▴ 20

Hello everyone,

I am new to pangenome graph analysis and am planning to construct a bread wheat (Triticum aestivum) pangenome graph using the PGGB pipeline for my PhD research.

I will use around 30 bread wheat cultivars, all assembled at chromosome level using PacBio HiFi reads and Hi-C data. From this pangenome, I aim to investigate structural variations (SVs), presence/absence variations (PAVs), and identify genes associated with agronomic traits.

Since bread wheat has a large (~16 Gb) and highly repetitive (~80%) genome, I expect significant computational and runtime challenges. I have read that in some studies (for example, in Arabidopsis pangenome graphs), assemblies were preprocessed by splitting into chromosomes and removing unplaced contigs before running PGGB.

My questions are:

For bread wheat, would you recommend constructing the pangenome graph per chromosome or using the whole genome?

I am concerned that some known interchromosomal translocations in wheat (e.g., 4AL–5AL–7BS) might not be captured if the graph is built separately for each chromosome.

Some of my assemblies were generated from Illumina short reads (from 10+ wheat project), while most are from PacBio HiFi reads (from recently published papers).

Is it appropriate to integrate assemblies produced by different sequencing technologies into the same PGGB graph? Or would that introduce biases in SV detection due to varying assembly quality and contiguity?

Any advice, experience, or reference suggestions on constructing a large, complex plant pangenome graph like wheat using PGGB would be highly appreciated.

Thank you very much for your time and guidance.

Best regards, Sony

pangenome structural-variation bread-wheat PGGB • 221 views

ADD COMMENT • link updated 9 hours ago by GenoMax 154k • written 9 hours ago by Sony Nguyen ▴ 20

0

Entering edit mode

colindaven has put together a great selection of software for pangenome analysis here --> https://github.com/colindaven/awesome-pangenomes

I will use around 30 bread wheat cultivars, all assembled at chromosome level using PacBio HiFi reads and Hi-C data.

Does it mean that someone else will be doing the libraries/assemblies? Sounds like an awful lot of work for a single student, even for the bioinformatics analysis.

ADD REPLY • link 9 hours ago by GenoMax 154k

0

Entering edit mode

Actually, I will do the T2T assemblies for 10/30 wheat samples. The remaining 20 samples will be obtained from public database (some of chromosome-level assemblies was produced by Illumina reads). I also know that it is very big jobs for 3 years of PhD, I will discuss with my supervisor about number of sample, as the genome size of wheat is large to analyze.

ADD REPLY • link 9 hours ago by Sony Nguyen ▴ 20