Question: Detecting Polyploidy Using An Assembly
5
gravatar for Fabian Bull
7.8 years ago by
Fabian Bull1.3k
German
Fabian Bull1.3k wrote:

A biologist once asked me if it is possible to detect polyploidy from an assembly. I thought no because duplicate genomic regions are merged in an assembly.

Is this thought process correct or is it possible? If yes, are there any tools?

assembly • 4.7k views
ADD COMMENTlink written 7.8 years ago by Fabian Bull1.3k
6
gravatar for Christian
7.8 years ago by
Christian2.8k
Cambridge, US
Christian2.8k wrote:

One way to test polyploidy is probably to realign raw reads against the assembly and counting read frequencies of variants.

In a diploid genome, we expect to find predominantly variants supported by 50% of the reads (two alleles, heterozygosity). In a triploid genome, we assume to find variants supported by 33% and 66% of reads (three alleles). Tetraploid: 25% and 75%. And so on.

I expect many caveats of this approach though. First, it assumes that all polyploid chromosomes are truly collapsed into a single chromosome in your assembly and not assembled as separate contigs. Second, it requires very high read coverage across the genome to call low frequency variants. Third, read coverage will fluctuate, making reliable estimates of read frequencies difficult. Fourth, variant mis-calls and duplicated regions will complicate the picture.

However, maybe it is possible to pool the genome-wide evidence of many thousand variants to come up with the most likely polyploidy status of your assembly.

ADD COMMENTlink written 7.8 years ago by Christian2.8k

This is a clever idea and could probably be applied to some recently derived auto-polyploids.

ADD REPLYlink written 7.8 years ago by Casey Bergman18k

That's brilliant idea! I have first seen it applied in Yoshida et al. 2013 (see Figure 9). We used it in analysis of Candida orthopsilosis hybrids (Figure S5). What's cool, we were able to detect copy number variations of individual chromosomes or even chromosomal arms (C. metapsilosis, submitted)! 

ADD REPLYlink written 4.1 years ago by Leszek4.0k

Have any new methods came up?

ADD REPLYlink written 9 months ago by Ric240
4
gravatar for Philippe
7.8 years ago by
Philippe1.9k
Barcelona, Spain.
Philippe1.9k wrote:

Hi,

if you still have the raw reads (before mapping) or the mapped reads you can try to identify duplicate genomic regions by detecting regions with significantly higher coverage. If your global coverage is of n and some region have a coverage of (theoretically) 2n this might indicate this region is duplicated. I unfortunately don't have some obvious reference to share (these are just memories from some presentation) but you can check what is done to detect CNVs for example (even though other methods are more widely used).

I hope it has been helpful.

Addition: from the comments below it seems the method is non-trivial and might not be the most suitable one.

ADD COMMENTlink modified 7.8 years ago • written 7.8 years ago by Philippe1.9k
2

in practice, this approach is a lot harder to execute since polyploidy is a global phenomenon. So it is futile to identify local doubling of coverage - as would be expected only if you have segmental duplication or CNV.

ADD REPLYlink written 7.8 years ago by Haibao Tang3.0k
1

As am minor note, coverage can vary a lot across the genome, due to GC content, mostly, so a 2-fold difference by itself might not mean much. You'd have to have another a control sample to compare too.

ADD REPLYlink written 7.8 years ago by Swbarnes21.5k
1

Based on my own experience, I agree with swbarnes2. Coverage varies a lot due to a lot of other factors, even you have high coverage across the entire genome. The 2-fold coverage approach is practically very hard to do quality control and calibration.

ADD REPLYlink written 7.8 years ago by Vitis2.1k

definetly one approach. thx.

ADD REPLYlink written 7.8 years ago by Fabian Bull1.3k

I agree with you (and Casey Bergman post which raised the same concern). If we consider polyploidy as the presence of supernumerary chromosomes (the actual definition) this approach won't work. But the question mentions "duplicate genomic regions" which motivated me to help on how to identify such regions. I could have been more precise.

ADD REPLYlink written 7.8 years ago by Philippe1.9k

Thanks for your different inputs, as I mentioned I just hear about such methods but don't experience with them. Reading from more experimented persons it seems this is not trivial and some other methods or additional experimental work might be more suitable. I'll update my first post according to this.

ADD REPLYlink written 7.8 years ago by Philippe1.9k
3
gravatar for Casey Bergman
7.8 years ago by
Casey Bergman18k
Athens, GA, USA
Casey Bergman18k wrote:

Interesting question. In theory, it is not possible to detect a recent, complete auto-polyploid genome from a WGS assembly since the copy number of all chromosomes would scale perfectly with ploidy. That is, if all regions of the genome in a polyploid are the same (ie. no sequence variation among homologous chromosomes), you can't tell if the genome is 1C, 2C, 4C, etc.

However, for an allo-polyploid genome or for partial (auto- or allo-) polyploidy that is not complete across the genome, then it should be possible to detect the polyploidy from assembly of divergent haplotypes or regional differences in read depth as noted by Phillipe.

ADD COMMENTlink written 7.8 years ago by Casey Bergman18k
2
gravatar for Eitan Rubin
7.8 years ago by
Eitan Rubin30
Eitan Rubin30 wrote:

In plants, there are works suggesting that polyploidization is accompanied by rapid accumulation of mutations (look up Avi Levy's work). So it should be possible to find multiple alleles - heterozygocity for SNPs + indels

Look up Avi Levy's work from the Weizmann Institute of Science (he worked on wheat).

Eitan Rubin

ADD COMMENTlink written 7.8 years ago by Eitan Rubin30

Hi Eitan, I was looking through Avraham Levy's papers, looking for "rapid accumulation of mutations" you mentioned, but without luck. Can you point me to a specific paper that elaborates on that concept? Thanks.

ADD REPLYlink written 7.8 years ago by Haibao Tang3.0k
1
gravatar for lexnederbragt
7.8 years ago by
lexnederbragt1.2k
Oslo, Norway
lexnederbragt1.2k wrote:

I don't agree with Casey Bergman that "it is not possible to detect a recent, complete auto-polyploid genome from a WGS assembly". Say, your (duplicated) genome is 1Gb in size, you sequence to 100x coverage, so 100Gb. After assembly, for a non-duplicated genome, you would expect the assembly size to be approximately 1 Gb, with an average coverage of the non-repetitive parts approx. 100x. If the genome instead was a complete auto-polyploid (all chromosomes duplicated), duplicate chromosomes collapsed during assembly, so you will see something like a 0.5 Gb total assembly size with an average coverage of the non-repetitive parts of 200x. This is of course an ideal situation, but you get the idea.

ADD COMMENTlink written 7.8 years ago by lexnederbragt1.2k

This requires knowledge about the genome size, which was not stated in the question. Clearly with additional information (e.g. a reference genome, knowledge of genome size, cytological information), ploidy can be estimated. Though I would not find overall fold-changes in depth of coverage convincing evidence since the expected throughput of a WGS experiment is not the observed throughput and you could make false inferences with this approach.

ADD REPLYlink written 7.8 years ago by Casey Bergman18k

You're right, I had forgotten that one needs and estimated genome size for this. I stand corrected...

ADD REPLYlink written 7.8 years ago by lexnederbragt1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1182 users visited in the last hour