Question

OMA gene vs splicing variants in the output

0

Entering edit mode

5.0 years ago

mateusz.konczal • 0

Hi there,

I've been using OMA standalone to identify orthologs and their evolution in my data set. Couple of species in my data set include splice variants, so as described in the manual, I included.splice files. Analyses went smoothly, used_splicing_variants.txt appeared in the output folder, and it looks fine.

But, when I analyzedHierarchicalGroups.orthoxml (or other .orthoxml files) I realized that each splice variant is encoded as a separate gene.

The .orthoxml file was used (with pyHam) to infer number of gene families with duplications, number of gained/lost genes, but now I'm not sure how to interpret these results. PyHam seems to interpret each splice variant as a separate gene and number of "gene gains" sum up to number of transcripts. Is there a way to use only one splice variant per gene in the PyHam analyses with OMA output?

Thanks in advance!

OMA orthologs OMA orthologs pyham • 1.3k views

ADD COMMENT • link updated 5.0 years ago by Adrian Altenhoff ★ 1.1k • written 5.0 years ago by mateusz.konczal • 0

score 2 · Answer 1 · 2019-04-29

Hi,

currently pyham does not have an option to skip some gene elements from an orthoxml file. But we agree that this is an important feature and plan to implement it in the future.

As a short term fix, the easiest option in my view is to remove in the orthoxml file the <gene/> elements that do not correspond to a used splicing variant. None of these 'genes' is part of any HOG, so removing the gene elements is sufficient.

In case you do not care about the gene gains/losses in the terminal branches leading to the individual species, you can actually use the file directly. The minor variants will all appear as gene gains in the terminal branches.

Best wishes, Adrian