Question

Forum:Pan-genome graphs make the popular science press

7

Entering edit mode

8.9 years ago

Mary 11k

There's much talk on my twittersphere about this piece in MIT Technology Review: Rebooting the Human Genome. It talks about how the current reference genome concept misses so much of the human variation that we need to capture as we sequence more and more people's personal genome data.

But there's also some confusion. I don't think the concepts of the graphs was really well described in there. In an earlier thread here we talked about it a little, but I wasn't able to find the talk I'd heard about this which was helpful to me. But I found a similar one, and maybe this will help people to get the idea of the graphs instead of just the current linear view we have of the reference genome.

You can watch the whole thing, of course. But the part about the graph ideas come in to this talk around 52 minutes.

https://youtu.be/hO4CInowk-g

So the idea is that we have to be able to account for the "bubbles" that don't match a linear reference string. Some bubbles will be alterations, some insertions, some deletions, some inversions--but we can capture this with graph representations that go beyond our current tools. But they are all valid, and we need to know and see this variation better.

Anyway, I'm posting because I think it's important to be aware of. And I think that even researchers in the field aren't that familiar with the ideas yet.

This paper was also helpful to me to understand the concepts, but unfortunately is not open access: Building a pan-genome reference for a population. doi: 10.1089/cmb.2014.0146 http://www.ncbi.nlm.nih.gov/pubmed/25565268

If anyone else has good introductions to the representations of these variant graph concepts I'd like to see them.

Edit to add: this paper has some of Haussler's graphs too: http://arxiv.org/abs/1404.5010

reference-genome pan-genome • 3.7k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by Mary 11k

0

Entering edit mode

The author of the MIT piece provided this slide set: https://docs.google.com/presentation/d/1utWF1_Er6bfAAwYWWRvDL-XI73uFC17WNi45t-TXveM/edit#slide=id.p That's helpful too.

ADD REPLY • link 8.9 years ago by Mary 11k

Ram · Answer 1 · 2015-06-03

4

Entering edit mode

8.9 years ago

h.mon 35k

Relevant for the discussion:

The SPAdes assembler outputs fasta and fastg files. But it seems the FASTG proposal was not discussed on Nguyen et al. (2015).

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by h.mon 35k

Ram · Answer 2 · 2015-06-03

4

Entering edit mode

8.9 years ago

Istvan Albert 100k

Related writing by Heng Li, On the graphical representation of sequences: http://lh3.github.io/2014/07/25/on-the-graphical-representation-of-sequences/

ADD COMMENT • link updated 15 months ago by Ram 43k • written 8.9 years ago by Istvan Albert 100k

0

Entering edit mode

very good read indeed.

ADD REPLY • link 8.9 years ago by h.mon 35k