Program for updating custom annotations with new genome?
1
0
Entering edit mode
6.1 years ago
brismiller ▴ 50

Hi all,

I have a custom annotation file (gtf) that has custom annotations that were made using an older version of a genome, and I want to update those annotations so I can use them on the newest genome version. These annotations are simple single genomic ranges (no gaps) and range from ~150bp to ~250,000bp in length. I am working in the Tetrahymena thermophila model organism whose genome changes every few years as new sequencing experiments are complete.

I have started writing a custom python script to extract the sequences from the old genome, BLAT them to the new genome, and extract the new coordinates but the script is taking a long time to write and I was wondering if there was already a program/tool out there that I could use for this (what I assume is a) common task. Any suggestions?

assembly next-gen RNA-Seq update annotations • 1.5k views
ADD COMMENT
0
Entering edit mode

Have you looked at RATT/PAGIT?

ADD REPLY
0
Entering edit mode

Thanks for this suggestion, I will keep this in mind if I need to work with a larger annotation set.

ADD REPLY
0
Entering edit mode

simple single genomic ranges (no gaps) and range from ~150bp to ~250,000bp in length

so then we're not talking 'normal protein coding' genes here, right?

How different are the genomes (or how severe are the changes)?

ADD REPLY
0
Entering edit mode

Correct we are not talking about 'normal protein coding'. In fact, they are not genes at all, they are sRNA precursor ranges (from which sRNA are produced).

From the answer below I saw that there were 10 annotations that resulted in a new sequence in the new genome. For some of them the strandedness of the scaffold (chromosome) had switched, and in others, there were poly N gaps that were filled in. So from just looking at my annotations, I would say that the genomes were fairly different, but I don't know where the official documentation of the changes is.

ADD REPLY
1
Entering edit mode
6.1 years ago
brismiller ▴ 50

In the end, I just manually used BLAST to find the new coordinates. For each annotation I extracted the sequence from both genomes and compared them, most were the same (unchanged coordinates), and there were only 20 that I had to BLAST. This question is resolved, but if people want to comment on which tools to use if the annotation set was much larger, that would be interesting to me.

ADD COMMENT

Login before adding your answer.

Traffic: 1526 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6