Question

Is there a 1000GP pangenome available?

0

Entering edit mode

17 months ago

בת אל • 0

Hi, I'm new to variation graphs and want to explore new alignment technologies of low-coverage ancient sequences (in FASTQ).

Usually, ancient sequences are linearly aligned to hg19 using BWA.

Is there a suitable pangenome graph alternative available? or do I have to create one myself from the 1000G reference panel, for example?

Thanks, B.

vg alignment pangenome low-coverage map • 1.3k views

ADD COMMENT • link updated 11 months ago by Jordan M Eizenga ▴ 530 • written 17 months ago by בת אל • 0

score 4 · Answer 1 · 2023-02-16

Not sure any standard tool can accept a pangenome graph model yet. Or that it is really defined in any final, formal way. Appears the pangenome is still in the early research and formulation stage with tools and equipment to follow.

See HPRC for more information on a major, funded project. Especially scroll to the bottom of the home page. Their ultimate goal is only 350 samples; with the first 200 being trios from the 1KGP cell lines. But then adding more to try and fill in with missed, isolated populations that were not sampled yet.

HPRC have released year 1 data of 47 cell samples (in various stages of completion). See HPRC Data and Tool repository on Github for the tool pipeline, year 1 data. Likely the best, detailed overview is their paper (preprint last July; don't recall that the final has come out yet)

Once a pangenome model is done (or a more final draft available), it would be great to see if Ancient DNA groups like the Reich lab could use the model or even the tools to expand the graph reference model using artifact DNA. But maybe most aDNA degraded so not enough of a complete genome can be ascertained.

The long read sequencing technology needed for this work is just now being commercially released. But the read length and accuracy to make this de novo model generation practical and automatic is not there yet; it appears. Currently, a lot of manual tuning in the labs today. UPDATE: New tool Verkko to automate the T2T process just published today.

The T2T consortium (another funded, multi-organization project), which is inter-related with HPRC, has published and released the first full genome assembly (in linear form). But it used the HPGP HG002 Y added to their pioneering work on the CHM13 haploid autosome and X cell line. X and Y of HG002 were the furthest along in HPRC year 1 data. See T2T Consortium to follow further. I think the complete T2T of HG002 is in final stages of QC from what I read in another post. That will be the first, single human diploid T2T model. Only 349 more to go after that :)

Can you really create a complete genome model (using de novo assembly) from short read sequencing that is available from the 1KGP? Or maybe you did not mean to imply that with your "create one yourself" comment.

There is a sort of graph based reference in the way DRAGEN has a custom reference with many more alt-contigs than the standard reference genomes. See DRAGEN demystifying genomes and DRAGEN Graph Mapper tool. Illumina and AWS have hosted a rerun of all the 1KGenome datasets through DRAGEN. But it is not clear if they used their Graph Reference model for that (and whether that is unwound before generating a final BAM or VCF to then only use the more standard linear reference with far fewer alt contigs (e.g. hs38DH). See the DRAGEN reanalysis of the 1KGenome Data Set on AWS Maybe this is what you wanted?

score 1 · Answer 2 · 2023-08-20

1

Entering edit mode

11 months ago

Jordan M Eizenga ▴ 530

You might be interested in the 1000 GP pangenome graphs that were constructed for the analyses in this paper. The data resources are all available here: https://github.com/vgteam/giraffe-sv-paper

ADD COMMENT • link 11 months ago by Jordan M Eizenga ▴ 530