Constructing a bread wheat pangenome graph using PGGB: per-chromosome vs. whole-genome approach and integration of assemblies from different sequencing technologies ?
1
0
Entering edit mode
3 days ago
Sony Nguyen ▴ 20

Hello everyone,

I am new to pangenome graph analysis and am planning to construct a bread wheat (Triticum aestivum) pangenome graph using the PGGB pipeline for my PhD research.

I will use around 30 bread wheat cultivars, all assembled at chromosome level using PacBio HiFi reads and Hi-C data. From this pangenome, I aim to investigate structural variations (SVs), presence/absence variations (PAVs), and identify genes associated with agronomic traits.

Since bread wheat has a large (~16 Gb) and highly repetitive (~80%) genome, I expect significant computational and runtime challenges. I have read that in some studies (for example, in Arabidopsis pangenome graphs), assemblies were preprocessed by splitting into chromosomes and removing unplaced contigs before running PGGB.

My questions are:

For bread wheat, would you recommend constructing the pangenome graph per chromosome or using the whole genome?

I am concerned that some known translocations in wheat might not be captured if the graph is built separately for each chromosome.

Some of my assemblies were generated from Illumina short reads , while most are from PacBio HiFi reads (from recently published papers).

Is it appropriate to integrate assemblies produced by different sequencing technologies into the same PGGB graph? Or would that introduce biases in SV detection due to varying assembly quality and contiguity?

Any advice, experience, or reference suggestions on constructing a large, complex plant pangenome graph like wheat using PGGB would be highly appreciated.

Thank you very much for your time and guidance.

Best regards, Sony

pangenome structural-variation bread-wheat PGGB • 765 views
ADD COMMENT
0
Entering edit mode

colindaven has put together a great selection of software for pangenome analysis here --> https://github.com/colindaven/awesome-pangenomes

I will use around 30 bread wheat cultivars, all assembled at chromosome level using PacBio HiFi reads and Hi-C data.

Does it mean that someone else will be doing the libraries/assemblies? Sounds like an awful lot of work for a single student, even for the bioinformatics analysis.

ADD REPLY
0
Entering edit mode

Actually, I will do the T2T assemblies for 10/30 wheat samples. The remaining 20 samples will be obtained from public database (some of chromosome-level assemblies was produced by Illumina reads). I also know that it is very big jobs for 3 years of PhD, I will discuss with my supervisor about number of sample, as the genome size of wheat is large to analyze.

ADD REPLY
0
Entering edit mode
13 hours ago

I am concerned that some known translocations in wheat might not be captured if the graph is built separately for each chromosome.

You are right; these will be missed (alongside any other unknown inter-chromosomal translocations)

Some of my assemblies were generated from Illumina short reads , while most are from PacBio HiFi reads (from recently published papers).

You will probably only be able to use your long-read assemblies to build the graph. Particularly considering the repeat content. You may want to consider the short-read strains afterwards for genotyping variants etc.

Is it appropriate to integrate assemblies produced by different sequencing technologies into the same PGGB graph? Or would that introduce biases in SV detection due to varying assembly quality and contiguity?

It is definitely doable, integrating different sequencing technologies, but in my mind, it would be only long-read tech like ONT and PacBio. In terms of biases I don't think there will be any clear answer there as this will depend on a lot of factors such as the sequencing chemistries, read length, additional data etc as opposed to just which technology.

It looks like pggb now offers a pipeline for assigning contigs to clusters/communities prior to graph building with partition-before-pggb. Perhaps this can help in determining the communities that would be informative for you

ADD COMMENT

Login before adding your answer.

Traffic: 3113 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6