I'm trying to get some differential expression data on some duplicated genes and I'm having a few problems.
My study organism's genome has been sequenced and partially annotated, and I'm trying to annotate an as yet unannotated gene pathway. I know that the gene I'm looking at has four duplications in the genome. My lab has a transcriptome constructed through trinity, and I've identified which four genes (it actually looks like four isoforms of the same gene) in the transcriptome are the four gene duplications. So far so good. However, although I know the group of four genes in the transcriptome that are the group of four duplications, I don't know exactly which is which. Is there a way to work this out for duplicated genes? I've tried constructing phylogenies, but these all just look like combs rather than trees. Any strategy/tool suggestions would be super useful!
Also, is it possible to analyse the differential expression data for four such closely related genes? The transcriptome fragments from one gene might be read as one of it's duplicates, and throw my CPM off.
Thanks in advance!