Question: Simulating Core / Accessory genome for incomplete genome assemblies
gravatar for Anand Rao
3.4 years ago by
Anand Rao350
United States
Anand Rao350 wrote:

I am looking to infer core and accessory genomes for ~ 300 fungal strains of the same species. Their drafts are of different build quality. One goal is to try and make predictions for how many core and accessory genes may be missing from each of my draft genomes, based on how many conserved genes are missing (using results from a tool such as BUSCO or CEGMA etc) - and therefore, to come up with both an empirical result based on as-as data, but also a more expanded dataset based on simulating how it would have looked had all genomes been completed ones.

My understanding is that it is common for core and accessory to evolve at two significantly different rates. And as such I am not sure if BUSCO results will be a surrogate for extrapolation to the accessory genome (core genome may be OK, I suppose?).

With that as context, here are my questions:

1. How can I simulate the expected number of core genes, given that nearly all of my draft genomes do not contain all expected BUSCO genes (but to different degrees)?

2. How should I change this simulation for accessory genes, given how these evolve differently from core genes?

3. I looked around in literature, but was unable to find a directly relevant paper. Are there any prior published work looking into this?


ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Anand Rao350
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1663 users visited in the last hour