Question: Anyone know of Tetraploid NGS available?
gravatar for Adrian Pelin
5.9 years ago by
Adrian Pelin2.4k
Adrian Pelin2.4k wrote:


I am trying to interpret my k-mer graph which indicates that my organism might be tetraploid. However, I cannot find any good tetraploid NGS data to compare to. All plants are usually either multiple pooled individuals or at a very low coverage.

Ideally, I am looking for something sequenced at 70x and higher, and consisting of one individual sequenced, not pool-seq. Also, the organism would need to be an autotetraploid, not allotetraploid, as the former is what I am working with.

Let me know if you have seen such a dataset. Thank you,


ngs tetraploid k-mer • 1.6k views
ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by Adrian Pelin2.4k

What's the difference from a pooled diploid? That should satisfy for investigative purposes.

ADD REPLYlink written 5.9 years ago by karl.stamm3.6k

I did that, and it seems as a rational approach, except that you would need to do some unorthodox in silico manipulations.

  • First you need to find sequences from 2 different isolates of the same species.
  • You would need to make sure they are not too divergent and not too similar either. If they are too divergent and do not share homozygous regions, you will not get a peak for such regions. If they are too similar, there won't be enough variation to show tetraploidy, but might look like a diploid.
  • You would need to make sure that both runs are roughly of the same quality and you would need to subsample it in a way to make sure they are also of the same coverage/sequencing depth.

I did that, with 2 Candida isolates, and I got the same pattern I expected for tetraploidy, except I am not sure how publishable that is. Here is how it looks:

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Adrian Pelin2.4k
gravatar for SES
5.9 years ago by
Vancouver, BC
SES8.4k wrote:

My answer is for plants specifically because I think that's what you are looking for.

Your best options are probably Gossypium hirsutum (upland cotton) or alfalfa, both of which are tetraploid species with genome sequencing projects under way. It will not be easy to get access to 70x coverage WGS data though because the genomes are not published. You will likely have to request access to the data and sign a waiver saying you won't publish anything.

One thing that comes to mind is why do you need 70x coverage? If you considered using less coverage I'm sure you can find what you need in Genbank. Also, have you thought about comparing to the bread wheat genome which is hexaploid? That data is available but you may have to request access. The other thing I would consider is whether you want to compare an autotetraploid species like potato, or an allotetraploid like upland cotton. I would expect those two species to display different patterns of k-mer frequencies.

EDIT: Since you are interested in autotetraploids specifically, I would suggest using potato (Solanum tuberosum) because it is an autotetraploid. Also, the genome has been published and the raw data has been deposited in the Short Read Archive (under project #SRA029323). There looks to be plenty of data in the SRA to get high coverage (the potato genome is only 844 Mbp).

ADD COMMENTlink modified 5.9 years ago • written 5.9 years ago by SES8.4k

Excellent input. I have updated my question to indicate my interest in autotetraploids, because indeed that is what I am working on.

Next, I want 70x and above coverage, because my isolates vary in coverage from 17x to 120x, and I find that at 50x you can barely distinguish the peaks. In other words there is not enough sampling to produce a histogram showing clearly defined peaks.

I will of course try and settle for less if it can prove my point. If more species come to mind, please do not hesitate to suggest them, I will try them all.

Comparing a hexaploid would be interesting, I will give it a shot if I can find the data or gain access to it :) Thank you again. By the way, plants are of course the best candidates for this sort of stuff, but other organisms should work as well.

ADD REPLYlink written 5.9 years ago by Adrian Pelin2.4k

Thanks for the response. I updated my answer to include information about accessing GSS data for potato, and I hope that helps.

ADD REPLYlink written 5.9 years ago by SES8.4k

I remember running into this paper at one point, but did not understand the experiment design very well and decided not to touch it. In the abstract, they state "Here we use a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome.". What is a doubled-monoploid potato clone? Since this is what they sequenced, is it still a autotetraploid? They also state "We also sequenced a heterozygous diploid clone", but in their methods they do not indicate what sequencing belongs to the diploid and what belongs to the double monoploid. This is a bit confusing as I am not sure what am I trying to use and what am I trying to avoid.

ADD REPLYlink written 5.9 years ago by Adrian Pelin2.4k

Apparently the potato genome is highly heterozygous, which made assembly of a wild-type individual impossible. So, they sequenced two strains, one was the DM line, derived from tissue culture, which is one copy of the chromosome compliment doubled. They also sequenced the RH line to sample the diversity of the germplasm, and this more closely resembles the cultivated potato according to the paper. Neither are actually tetraploid, but this appears to have been the best strategy to assemble the genome.

I would bet that similar strategies have been used with other polploid genome projects because of the challenges, but also it is probably unnecessary to sequence each duplicated chromosome in order to infer historical patterns. Though, I will check into this further and let you know what I find out. 

ADD REPLYlink written 5.9 years ago by SES8.4k

Thanks for the help! Let me know if you find anything. The strategy is indeed sound to generate significant and usable genome assemblies.

ADD REPLYlink written 5.9 years ago by Adrian Pelin2.4k

I looked more into it, and based on Figure 3 and the abstract it is now clear what they did. They sequenced a doubled-monoploid which is basically a diploid with little heterozygosity (DM) and a very heterozygous diploid (RH). So their data is not tetraploid, and by looking at Figure 3b you can see they did k-mer analysis already.

ADD REPLYlink written 5.9 years ago by Adrian Pelin2.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1732 users visited in the last hour