Question: Diploid genome gene annotation
gravatar for msobol
25 days ago by
msobol10 wrote:

Hi guys,

I am working with several diploid fungal genomes but confused on how I deal with the duplicated genes. I started by assembling the diploid genomes using dipSPAdes, then gene finding with Maker. The reported number of genes is ~20,000 genes, which is about 2x as many genes that are reported for haploid genomes of the same genus. So my question is, is it okay to go forward with functional gene annotation, or do I need to somehow get rid of the duplicate genes in the genome? I have been confused about whether it is appropriate to publish the diploid version of the genome, or if it is necessary to report the haploid version. I hope this makes sense!

Thanks in advance, Morgan

annotation diploid gene genome • 119 views
ADD COMMENTlink written 25 days ago by msobol10

Is the species you're working on (highly) heterozygotic? If not then dipSPADES is unfortunately not the most appropriate choice of assembler software.

ADD REPLYlink written 25 days ago by lieven.sterck4.5k

I do not believe so, (but also not 100% sure) because they exhibit both haploid and diploid cells. I personally observed this under the microscope after staining with DAPI. Also, the diploid genome was much higher quality in terms of number of contigs, size, and N50 when I compared the assembly to the regular SPAdes assembly. BUSCO confirmed that approximately 70% of the single copy orthologs were duplicated. Do you recommend a certain assembler so that I could compare them?

ADD REPLYlink modified 25 days ago • written 25 days ago by msobol10

It's true that is not common to find diploid genome annotation within databases. I don't think the EBI or NCBI submission pipeline will make any difference whether it is a haploid or diploid annotation. But you should contact them to know what would be the best way to submit your data. I'm looking forward to hearing more about it. One of the problem I could see is that the alleles of a locus have two different gene identifiers in your MAKER annotation. So it means then you will have two loci identifiers for only one locus... So it would be bit wierd ...

ADD REPLYlink modified 25 days ago • written 25 days ago by Juke-342.1k

Thanks for the advice, I'll contact the databases and can post an update here. I should have thought about this sooner before proceeding with assembly and annotation :/ I just wonder if there is a way to "fix" this with the gene predictions instead of having to start from the beginning with the assemblies.

ADD REPLYlink written 25 days ago by msobol10

If you know which contigs are part of which assembly (primary or secondary) then it's not a problem to filter your annotation.

ADD REPLYlink written 25 days ago by Juke-342.1k

That is good to hear. Do you recommend any program that can do this? Would it basically be some sort of alignment program that can detect the duplicated genes?

ADD REPLYlink written 25 days ago by msobol10

Usually it is your assembler that would give you the phased genome. But I don't know how look the dipSPAdes outputs.

ADD REPLYlink written 25 days ago by Juke-342.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 699 users visited in the last hour