Question: data assembling and analysis for polyploid plant
gravatar for vinothinisankaran
4 weeks ago by
vinothinisankaran0 wrote:

when polyploid crop plant is sequenced, how do they do data assembly and analysis for haploid genome of that crop ?

assembly genome • 144 views
ADD COMMENTlink modified 9 days ago by pltbiotech_tkarthi130 • written 4 weeks ago by vinothinisankaran0
gravatar for Vitis
29 days ago by
New York
Vitis2.0k wrote:

This could take an entire book to explain. I think currently the common approach is to identify and sequence the ancestral genomes, then tackle the polyploid genome, or assemble the genomes in parallel. This has generated good assemblies for canola, cotton and wheat.

ADD COMMENTlink written 29 days ago by Vitis2.0k
gravatar for pltbiotech_tkarthi
17 days ago by
CIMMYT, Mexico
pltbiotech_tkarthi130 wrote:

Once the genome sequence or reference sequence are available for a species, it is possible to use the fragment of interest or contig of interest (if the sequences are publicly available) and try to use the reference sequence with your .ab1 (chrosomototgram file,if the sequence is derived from Sanger's sequencing method, or fastq file next generation sequencing) and assemble them against the reference sequence using assembly softwares like Codon code ( or Genious ( and you will get the output as an assembled format. Please note for diploid or polyploid, haploid level sequence is available from databases, you need to findout variant form of sequence/allele from your sequence. Polyploid will act as diploid species (for instance wheat has 7 homeologous groups of chromosomes: For instance group A,B, D chromosome: 1A-7A, 1B-7B, 1D-7D. So in total these homeologous sets can be multiplied by 3 (groups) 7 (Chrosomosomes)=21 (Total homeologous haploid level)2 to have 42 (Total homeologous diploid level chromosomes at diploid level. If the gene has copy with in chromosome, it can be paralogous or if the gene copies present in different homeologous chromosomes, these copies are homeologous or the same copy present in another species are orthologous. Check these servers also:

ADD COMMENTlink modified 17 days ago • written 17 days ago by pltbiotech_tkarthi130

*I hope this appropriate for me to comment...

I am working with a non-model fungus and also have this question. This is the first time I have dealt with whole genome assembly and annotation of diploid organisms. I compared assemblies from SPAdes and dipSPAdes (diploid version of the assembler) and found that the diploid assemblies were better. I went forward with the diploid assemblies for gene prediction, but now I am wondering if it is best to de-duplicate the genome prior to gene prediction or if there is a way to do de-duplicate afterwards? Any ideas?

ADD REPLYlink written 10 days ago by msobol10
gravatar for pltbiotech_tkarthi
9 days ago by
CIMMYT, Mexico
pltbiotech_tkarthi130 wrote:

Actually working with diploid is slightly easier than polyploids, however eventually polyploids can also act as diploid as I given example above. In lab generally people used to do doubled haploid from haploid set of chromosomes. You can read it for instance:

But dedublication of genome is not as easy. You may choose gene knockout kind of studies. If you know exactly two haploid sets (Haploid set1+Haploid set2), you may eliminate a single set computationally. But you might not know, which set haploid sets were sequenced. Perhaps you will know if you sequence the genome from haploid sets from tissues like anther in plants to have single set. In case of fungus you have to use spore or hyphae after meiosis phase before they entering to diploid life cycle. Unless you know the information heterozygous or homozyogus nature of the particular gene, it is not possible to deduplicate. Since you may miss the information or allele of interest. One thing, you can try if your sequence (.ab1 file) is showing any double peaks in the gene of your interest, then there may be an alternative form of gene ie an allele and indicates heterozygotic position of the SNP. Let's assume

atg ggg tgg cgg gtt ttg tga allele 1---from Haploide set1 M G W R V L -
atg ggg tgg cgg gtt gtg tga allele 2---from Haploid set 2 M G W R V V -

This is useful:

So you have to know which haploid set or sets you have and you have to know which allele is dominant or favorable, accordingly you could directly state a allele of interest for your study instead of going for computational de-duplication.

ADD COMMENTlink written 9 days ago by pltbiotech_tkarthi130
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2213 users visited in the last hour