Question: Program for updating custom annotations with new genome?
0
gravatar for brismiller
14 months ago by
brismiller10
Bellingham, WA, USA
brismiller10 wrote:

Hi all,

I have a custom annotation file (gtf) that has custom annotations that were made using an older version of a genome, and I want to update those annotations so I can use them on the newest genome version. These annotations are simple single genomic ranges (no gaps) and range from ~150bp to ~250,000bp in length. I am working in the Tetrahymena thermophila model organism whose genome changes every few years as new sequencing experiments are complete.

I have started writing a custom python script to extract the sequences from the old genome, BLAT them to the new genome, and extract the new coordinates but the script is taking a long time to write and I was wondering if there was already a program/tool out there that I could use for this (what I assume is a) common task. Any suggestions?

ADD COMMENTlink modified 14 months ago • written 14 months ago by brismiller10

Have you looked at RATT/PAGIT?

ADD REPLYlink written 14 months ago by genomax68k

Thanks for this suggestion, I will keep this in mind if I need to work with a larger annotation set.

ADD REPLYlink written 14 months ago by brismiller10

simple single genomic ranges (no gaps) and range from ~150bp to ~250,000bp in length

so then we're not talking 'normal protein coding' genes here, right?

How different are the genomes (or how severe are the changes)?

ADD REPLYlink modified 14 months ago • written 14 months ago by lieven.sterck5.1k

Correct we are not talking about 'normal protein coding'. In fact, they are not genes at all, they are sRNA precursor ranges (from which sRNA are produced).

From the answer below I saw that there were 10 annotations that resulted in a new sequence in the new genome. For some of them the strandedness of the scaffold (chromosome) had switched, and in others, there were poly N gaps that were filled in. So from just looking at my annotations, I would say that the genomes were fairly different, but I don't know where the official documentation of the changes is.

ADD REPLYlink modified 14 months ago • written 14 months ago by brismiller10
1
gravatar for brismiller
14 months ago by
brismiller10
Bellingham, WA, USA
brismiller10 wrote:

In the end, I just manually used BLAST to find the new coordinates. For each annotation I extracted the sequence from both genomes and compared them, most were the same (unchanged coordinates), and there were only 20 that I had to BLAST. This question is resolved, but if people want to comment on which tools to use if the annotation set was much larger, that would be interesting to me.

ADD COMMENTlink written 14 months ago by brismiller10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 801 users visited in the last hour