when should I merge in annotation transdecoder pipeline
2
0
Entering edit mode
23 months ago
Mike ▴ 10

Hi all,

I'm not certain what's the best way to do this, so any help will be highly appreciated.

I have a merged gtf file that I created after doing rna-seq>STAR>stringtie pipeline with a refernce genome.

I also have other gtfs and files such as: RFAM DB results, exonerate (protein alignment) result, ab initio result from AUGUSTUS.

I then used the transdecoder tool to predict ORFs but I only used the stringtie merged gtf file. Is it better to try and merge all of my diferent inputs to get a larger and more descriptive gtf and then use transdecoder on it for the final results, or should I use transdecoder on the rnaseq pipeline resuls and then merge the result gff with the other gtfs I got from different type of evidences?

The goal is to create gene prediction models based on all this evidence and the input genome.

Thanks a lot.

agat transdecoder gtf annotation • 1.3k views
ADD COMMENT
0
Entering edit mode

Not sure if I can follow:

it seems you want to do gene prediction on a genome but if I remember correctly TransDecoder is used for ORF finding on transcripts.

ADD REPLY
0
Entering edit mode

You are correct, StringTie pipeline results in an assembled transcriptome on which I use transdecoder

I thought that finding ORFs within these transcripts can help me gather information regarding the genes but perhaps I am wrong.

I'm trying to create gene models with what I have, but I lack knowledge in this field.

If you have any recommendation on how to proceed with what I have (assembled transcriptome, exonerate protein2genome output, ab initio gene prediction, RFAM results and transdecoder) Id love to hear it, Thanks!

ADD REPLY
0
Entering edit mode

ok.

No, you're not wrong here, at worst a bit sub-optimal :)

ADD REPLY
1
Entering edit mode
23 months ago

When you have all those information already at hand I would propose to go for a 'real' gene prediction tool/pipeline. Moreover, given the data you already have it would make sense to use a tool that can integrate all those information levels.

Tools that come to mind are: MAKER, EuGene, EvidenceModeler, GLEAN, ... (and there are many others). Those should be able to integrate the info you have and making use of the genomic sequence they will result in a merged non-redundant set of genes .

ADD COMMENT
0
Entering edit mode

Thanks a lot,

I tried using EVidenceModeler but encountered many problems and bugs so I started thinking about doing this manually. I'll look into the other tools, if you have helpful knowledge regarding them I'll be happy to hear it.

ADD REPLY
0
Entering edit mode
23 months ago

Hi, maybe you should give MOSGA a try.

ADD COMMENT

Login before adding your answer.

Traffic: 1984 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6