Question: Genome Assembly From Pooled Individual Dna
4.5 years ago by
David M540
I'm currently working with a eukaryotic genome (~1Gbp) from a small organism which is very difficult to extract quality DNA from. We've performed some preliminary sequencing on DNA pooled from multiple individuals.

Does anyone have any experience working with this sort of read data? Are there any methods or assemblers which are more suitable for dealing with data with this sort of very high polymorphism? Are there any tools which can pre-process the reads to reduce the level of polymorphism to something more approximating a diploid genome?

I'd appreciate any advice or experience from the community.


4.5 years ago by
Salk Institute, La Jolla
Not from personal experience but I understand that CRISP is a good method used for pooled data:

It seems that CRISP requires a reference genome, so I don't think it will work in this case.

4.5 years ago by
Vancouver, BC
cortex_var may be helpful since it was designed for the assembly of multiple individuals/samples. From the website,

cortex_var is a tool for genome assembly and variation analysis from sequence data. You can use it to discover and genotype variants on single or multiple haploid or diploid samples. If you have multiple samples, you can use Cortex to look specifically for variants that distinguish one set of samples (eg phenotype=X, cases, parents, tumour) from another set of samples (eg phenotype=Y, controls, child, normal). See our Nature Genetics paper and the documentation for detailed descriptions.

It has been used in quite a few publications, so those should give you an idea of the applications. I haven't used this tool personally, but it does appear to have been used on a wide variety of taxa from microbes to humans. This is not exactly what you are asking, but this previous thread (Recommendations For Heterozygous Genome Assembly Software) on assembling polymorphic genomes is relevant to your question.

