Question

genemark-es gene-finding with reference and annotation

0

Entering edit mode

6.2 years ago

from the mountains ▴ 230

I have generated a new fungal assembly on a species that has been assembled before, and I am trying to translate the annotations from the old assembly to the new assembly using genemark-es within quast. The quast manual says "If a gene file is provided with -G as well, both # genes in the file covered by the assembly, and # predicted genes are reported." So it sounds like quast can link the genes in the publicly available annotation to predicted genes in my assembly, yet I cannot find any output saying which predicted gene is which with regard to the annotation. Is there anything I can do within quast to get the pre-existing annotations to show up in the predicted genes? I am wondering if there is something formatted incorrectly in my input gff3 file. The 3rd column says "gene", and the attributes are "Name", "locus_tag" and "gene".

i should add that I expect the genomes to have lots of similarity. I previously mapped my illumina reads to the old assembly with ~90% mapping rate.

I would greatly appreciate the help!

DNA-seq assembly gene prediction • 2.3k views

ADD COMMENT • link updated 6.2 years ago by lieven.sterck 15k • written 6.2 years ago by from the mountains ▴ 230

score 1 · Answer 1 · 2018-03-01

1

Entering edit mode

6.2 years ago

lieven.sterck 15k

I don't have much experience with using genemark-es and quast to 'transfer' annotations form one genome version to the other. However i think there are other (more suitable) approaches to accomplish this.

have a look at RATT and/or liftover (with this one you will need to be able to link your new assembly to the old one). they usually do a pretty good job in transferring annotations.

Do be aware of the 'risks/pitfalls' of such an approach ;-)

ADD COMMENT • link 6.1 years ago by lieven.sterck 15k

0

Entering edit mode

thanks for your reply. I think I was misinterpreting the quast manual--i think that after I liftover an annotation so it corresponds to the new assembly, i could feed both the new assembly and the new annotation into quast and to report those genes plus new genes that aren't in the new annotation file.

i'm looking into liftOver and CrossMap right now. Unfortunately, the vast majority of reference annotations are going unmapped to the new assembly. do you know how i can get more information about the risks/pitfalls? how can i assess how good the alignment between assemblies in my chain file?

my assemblies may have some major differences because they are a unicellular fungus, but i can see from the syntenic plot that there are long contiguous sequences preserved in the new assembly. i would expect more than 2% of genes lifting over!

ADD REPLY • link 6.1 years ago by from the mountains ▴ 230

0

Entering edit mode

Well, you just described pitfall #1 : it might not be straightforward to link your old to the new assembly. (I think RATT works on a gene basis and is less influenced by this issue) .

If you can see obvious synteny between both assemblies you would indeed expect more genes to liftover. Perhaps something is off with your chain file?

ADD REPLY • link 6.1 years ago by lieven.sterck 15k