Question

when should I merge in annotation transdecoder pipeline

0

Entering edit mode

23 months ago

Mike ▴ 10

Hi all,

I'm not certain what's the best way to do this, so any help will be highly appreciated.

I have a merged gtf file that I created after doing rna-seq>STAR>stringtie pipeline with a refernce genome.

I also have other gtfs and files such as: RFAM DB results, exonerate (protein alignment) result, ab initio result from AUGUSTUS.

I then used the transdecoder tool to predict ORFs but I only used the stringtie merged gtf file. Is it better to try and merge all of my diferent inputs to get a larger and more descriptive gtf and then use transdecoder on it for the final results, or should I use transdecoder on the rnaseq pipeline resuls and then merge the result gff with the other gtfs I got from different type of evidences?

The goal is to create gene prediction models based on all this evidence and the input genome.

Thanks a lot.

agat transdecoder gtf annotation • 1.3k views

ADD COMMENT • link updated 23 months ago by BioinformaticBird ▴ 110 • written 23 months ago by Mike ▴ 10

0

Entering edit mode

Not sure if I can follow:

it seems you want to do gene prediction on a genome but if I remember correctly TransDecoder is used for ORF finding on transcripts.

ADD REPLY • link 23 months ago by lieven.sterck 15k

0

Entering edit mode

You are correct, StringTie pipeline results in an assembled transcriptome on which I use transdecoder

I thought that finding ORFs within these transcripts can help me gather information regarding the genes but perhaps I am wrong.

I'm trying to create gene models with what I have, but I lack knowledge in this field.

If you have any recommendation on how to proceed with what I have (assembled transcriptome, exonerate protein2genome output, ab initio gene prediction, RFAM results and transdecoder) Id love to hear it, Thanks!

ADD REPLY • link 23 months ago by Mike ▴ 10

0

Entering edit mode

ok.

No, you're not wrong here, at worst a bit sub-optimal :)

ADD REPLY • link 23 months ago by lieven.sterck 15k

score 1 · Answer 1 · 2022-05-22

1

Entering edit mode

23 months ago

lieven.sterck 15k

When you have all those information already at hand I would propose to go for a 'real' gene prediction tool/pipeline. Moreover, given the data you already have it would make sense to use a tool that can integrate all those information levels.

Tools that come to mind are: MAKER, EuGene, EvidenceModeler, GLEAN, ... (and there are many others). Those should be able to integrate the info you have and making use of the genomic sequence they will result in a merged non-redundant set of genes .

ADD COMMENT • link 23 months ago by lieven.sterck 15k

0

Entering edit mode

Thanks a lot,

I tried using EVidenceModeler but encountered many problems and bugs so I started thinking about doing this manually. I'll look into the other tools, if you have helpful knowledge regarding them I'll be happy to hear it.

ADD REPLY • link 23 months ago by Mike ▴ 10

score 0 · Answer 2 · 2022-05-22

0

Entering edit mode

23 months ago

BioinformaticBird ▴ 110

Hi, maybe you should give MOSGA a try.

ADD COMMENT • link 23 months ago by BioinformaticBird ▴ 110